DMA & High Throughput on SPI/I2C/UART: Bursts, Framing, Latency

Q: Throughput is high but CPU is also high — IRQ too frequent or cache thrash?

Likely cause: Completion IRQ rate is too high and/or cache maintenance is triggered excessively, causing scheduling and cache churn. Quick check: Log IRQ rate and DMA re-arm gap per 1 s window; compare CPU time in ISR vs consumer; check cache maintenance time spikes. Fix: Increase batch size (or enable half-transfer events), move heavy work out of ISR, and reduce cache ops by using DMA-safe regions and cache-line-aligned buffers. Pass criteria: IRQ rate < X, sustained payload ≥ Y, CPU(data-move) < Z%, no sustained queue backlog.

Q: After enabling DMA, occasional “old data” appears — flush or invalidate first?

Likely cause: Cache coherency maintenance is incorrect for the transfer direction, so CPU reads stale cache lines or DMA reads stale RAM. Quick check: Determine direction: DMA writes RAM → CPU reads (invalidate before CPU use) vs CPU writes RAM → DMA reads (flush/clean before DMA). Verify maintenance range expands to cache-line boundaries. Fix: Apply correct invalidate/flush at ownership handoff points (DMA done / buffer handoff) and enforce cache-line-aligned start/length for buffers. Pass criteria: CRC fails = 0, repeat/ghost samples = 0 across stress; coherence actions are deterministic and documented.

Q: Larger bursts increase jitter — check watermark, priority, or arbitration first?

Likely cause: Larger bursts reduce IRQ overhead but increase worst-case service time due to DMA/bus arbitration delays and consumer wake-up granularity. Quick check: Compare max latency and P99 jitter before/after increasing burst; log queue occupancy peaks and DMA wait time (if available). Fix: Cap maximum burst length, tune watermark to trigger earlier service, and raise priority for the real-time DMA channel (or isolate it from bulk traffic). Pass criteria: max latency < X and P99 jitter < Y under worst-case load; no sustained backlog at high watermark.

Q: Frame boundaries become corrupted — ring slicing or delimiter strategy issue?

Likely cause: Boundary metadata (markers) is not consistent with buffer ownership, or delimiter-based framing loses sync after a gap/overflow. Quick check: Validate marker monotonicity; check SEQ/CRC across boundary crossings and wrap points; confirm overflow recovery does drop-to-boundary. Fix: Prefer explicit length/marker slicing on the ring buffer; for delimiter schemes, add resync logic that searches next valid boundary after overflow or missing bytes. Pass criteria: boundary errors = 0, SEQ gaps = 0 (or bounded with recovery), wrap-around never produces malformed frames.

Q: RX occasional overrun — check FIFO watermark or consumer blocking first?

Likely cause: Consumer service cannot keep up with burst arrival, or watermark is set too late to absorb scheduling/arbitration delays. Quick check: Inspect FIFO_OVR and ring occupancy peaks; correlate overrun timestamps with consumer blocked time (locks/critical sections/IRQ masking). Fix: Lower the watermark, bound consumer critical sections, cap burst length, and add back-pressure or drop-to-boundary before hard overrun occurs. Pass criteria: FIFO_OVR = 0 under stress; occupancy stays below high watermark with margin; recovery never corrupts boundaries.

Q: SPI full-duplex reads 0xFF/0x00 — dummy/turnaround or CS timing? (DMA-focused)

Likely cause: RX/TX progress is not synchronized (insufficient dummy bytes or wrong pairing), or CS is not held across the intended DMA burst boundary. Quick check: Compare TX byte count vs RX byte count per burst; verify descriptor boundaries match CS hold policy; confirm first-byte content is discarded when dummy clocks are expected. Fix: Ensure paired TX/RX descriptors, insert explicit dummy phase when required, and keep CS asserted for the full transaction span. Pass criteria: device ID reads match expected value across N trials; SEQ/CRC stable; no “all 0xFF/0x00” bursts under load.

Q: UART high traffic shows occasional framing errors — sampling noise or buffer “holes”?

Likely cause: Consumer/ISR service creates buffer holes (gaps) so bytes are dropped, which looks like framing errors at the parser level under high load. Quick check: Correlate framing-error events with FIFO_OVR, ring occupancy spikes, and DMA re-arm gaps; verify IDLE/BREAK markers align with buffer slices. Fix: Lower watermark, cap burst/queue depth, and add drop-to-boundary resync when overflow happens; keep framing logic independent from DMA chunk size. Pass criteria: overrun counters = 0 (or bounded with defined recovery), parser resync time < T, sustained traffic shows no growing SEQ gaps.

Q: I²C with DMA becomes slower — transaction overhead vs batching, how to decide?

Likely cause: I²C performance is dominated by per-transaction overhead and gaps; DMA reduces CPU work but may not improve wire-time efficiency for short transfers. Quick check: Measure payload efficiency: payload time vs overhead+gap time; compare CPU load reduction vs end-to-end latency change when switching to DMA. Fix: Use DMA mainly for offload, combine small reads/writes when protocol allows, and avoid overly large DMA batches that increase first-byte latency. Pass criteria: CPU load drops by ≥ X% while throughput/latency remain within targets; no increase in timeout/retry events.

Q: Only fails at high load — first step for memory bandwidth / bus contention?

Likely cause: Under load, arbitration delays and memory contention expand worst-case service time, exposing hidden timing margins. Quick check: Disable one bulk DMA stream and observe if jitter/overrun disappears; log latency segments TS0→TS1 and TS1→TS2 to find which segment inflates. Fix: Raise priority for the real-time channel, cap burst length, limit queue depth, and schedule bulk transfers outside the critical window. Pass criteria: worst-case latency remains bounded (< X) with background traffic enabled; no high-watermark saturation.

Q: No data loss, but latency spikes — adjust queue depth or back-pressure strategy?

Likely cause: Queue depth is too large (hidden buffering), or back-pressure triggers too late, so service delay accumulates even without dropping bytes. Quick check: Monitor queue occupancy histogram and max occupancy; compare P99 latency with different queue depth caps; check if back-pressure occurs only after near-saturation. Fix: Reduce maximum queue depth, lower watermark, and implement back-pressure before saturation; keep recovery deterministic. Pass criteria: max latency < X and P99 jitter < Y; occupancy stays below cap; drop rate (if enabled) < Z and bounded.

← Back to: I²C / SPI / UART — Serial Peripheral Buses

DMA high throughput is achieved by building a measurable end-to-end data pipeline—FIFO → DMA → RAM buffer → consumer—then tuning burst size, buffering, and cache coherency to maximize sustained payload while keeping worst-case latency and jitter bounded.

What “High Throughput” Really Means on Peripheral Buses

“High throughput” is not a single knob. A design that maximizes sustained payload rate can easily worsen first-byte latency, increase jitter, or burn CPU on interrupts. This section fixes the measurement vocabulary and the decision order, so later tuning (DMA, buffering, framing) converges instead of_topics and avoids accidental scope creep into signal-integrity pages.

Key takeaways Define → Budget → Prove

Sustained throughput and peak burst are different goals; each implies different DMA batch sizes and interrupt cadence.
End-to-end latency must specify a reference point (first-byte vs last-byte vs service start); “average latency” alone is not a real-time guarantee.
Jitter (determinism) is driven by contention (DMA arbitration, memory, scheduling), not just bus frequency.
CPU budget is dominated by interrupt rate and data copies; DMA can reduce CPU while increasing latency if batching is unbounded.

The 4 metrics (measurement-grade definitions)

Sustained throughput

Effective payload rate over a long window (e.g., 1–10 s), excluding startup transients; this is what “streaming stable” means.

Peak burst

Short-window maximum payload rate; often limited by peripheral FIFO depth, DMA burst length, and memory bandwidth.

End-to-end latency

Time from “data becomes valid at the producer” to “data is usable by the application”. Must state the exact tap points.

Latency jitter (determinism)

Spread of end-to-end latency across time and system load. Use percentiles (P99/P999) plus maximum, not only averages.

Latency vocabulary (avoid measurement confusion)

First-byte latency: time until the first useful byte/sample can be consumed (most sensitive to batching).
Last-byte latency: time until the entire frame/block is complete (most sensitive to transfer size).
Service latency: time until the consumer actually starts processing (most sensitive to scheduling/locks/cache).

Bottleneck map (typical high-throughput chain)

Throughput and determinism are set by the weakest stage, not by bus frequency alone.

Bus bandwidth (wire time vs gaps)
Peripheral FIFO (watermarks, overruns/underruns)
DMA arbitration (priority, contention, burst length)
Memory/cache (bandwidth, cache-coherency overhead)
ISR/scheduling (interrupt rate, critical sections)
Consumer/application (back-pressure, processing budget)

When DMA is the right baseline (practical triggers)

CPU copy dominates: data-move work persistently consumes a noticeable share of CPU time and displaces real tasks.
Interrupt rate is the bottleneck: frequent per-byte/per-small-chunk interrupts cause a throughput “sawtooth” and missed deadlines.
Real-time deadlines exist: worst-case service time and jitter must be bounded, not just “good on average”.

Measurement hooks (place these before tuning)

Counters: FIFO overrun/underrun, DMA errors, retries, dropped frames, queue depth high-watermark.
Timestamps: producer-valid, DMA-complete, consumer-start, app-commit (consistent tap points).
CPU load: ISR time, copy time, lock wait time (separate “busy” from “blocked”).

Pass criteria (template)

Sustained payload ≥ X (MB/s) over window T
P99 end-to-end latency ≤ Y ms; max ≤ Z ms
Data-move CPU load ≤ W% (ISR + copy + cache maintenance)
Loss/overrun counters = 0 (or explicitly bounded by a stated drop policy)

Scope guard

This section focuses on throughput/latency/jitter/CPU definitions and end-to-end pipeline bottlenecks. Signal integrity, clock quality, and termination are handled in SCLK Quality & Skew and Long-Trace SI.

The pipeline is only as fast and as deterministic as its weakest stage. Measure at handoffs before changing batch sizes.

Throughput Budget Model: Bits on the Wire vs Useful Payload

If line rate is high but real payload throughput is low, the missing bandwidth is almost always consumed by overhead and gaps. A usable budget model must be decomposable: each loss term should be measurable and optimizable independently.

Practical payload model (decomposable into measurable factors)

Payload throughput = Line rate × Protocol efficiency × Continuity × (1 − Retry loss)

Each factor maps to a different class of fixes: framing structure, timing gaps, software cadence, or error recovery.

Protocol efficiency: structural overhead per payload (headers/commands/turnaround/dummies).
Continuity: how “continuous” transfer time is (burst continuity) versus being punctured by gaps.
Retry loss: bandwidth eaten by retries/NAKs/CRC recovery; treat as a first-class budget term.

The 3 overhead classes (separate them to avoid the wrong fix)

1) Frame / command overhead

Structural bytes and phases required per payload. Optimization usually means larger blocks, fewer boundaries, or better packing.

2) CS / turnaround gaps (hard gaps)

Mandatory timing holes between phases or devices. These show up clearly on a logic analyzer as wire-time with no payload.

3) Software gaps (soft gaps)

DMA re-arming delays, ISR work, cache misses, locks, and scheduling. These gaps disappear only when cadence is engineered.

Quick estimation (rule-of-thumb bands)

These bands are useful for sanity checks before deep optimization. If measured results land far below, a large gap term exists.

85–95% efficiency

Large blocks, few boundaries, minimal hard gaps, and DMA re-armed ahead of time (continuity is high).

70–85% efficiency

Periodic boundaries and moderate hard gaps; software cadence is acceptable but not fully pipeline-optimized.

60–70% (or lower)

Fragmented transfers with frequent turnarounds and visible software gaps; retries may amplify the loss dramatically.

How to prove the budget (measure → attribute → prioritize)

Measure wire-time segmentation: payload vs overhead vs gaps (logic/protocol analyzer).
Separate hard vs soft gaps: hard gaps persist even with a busy CPU; soft gaps correlate with ISR/scheduling stalls.
Measure retry loss as a rate: retries per second or per megabyte; treat it as a throughput tax.
Prioritize the biggest term: the largest gap/overhead term is the first optimization target.

Pass criteria (budget closure)

Efficiency computed from measurement matches the model within ±X% (no hidden loss term).
Hard-gap share and soft-gap share are separately quantified (each has a named root cause category).
Retry loss is bounded and tracked (rate + worst-case bursts), not just “it seems fine”.

Scope guard

This section models payload loss using overhead and gaps. Electrical edge quality, termination, and routing are handled in Long-Trace SI.

High line rate is wasted whenever overhead and gaps dominate wire time. Decompose the loss terms and optimize the largest first.

Latency & Determinism: Why DMA Can Make Latency Worse

DMA often improves sustained throughput and reduces CPU time by batching transfers. However, batching can increase first-byte latency and widen the latency distribution under contention. Real-time designs should start from a latency budget (P99/P999 and maximum), then choose batch sizes and watermarks that keep worst-case service time bounded.

Key takeaways Budget first, batch second

Batching trades latency for CPU: larger batches reduce interrupts but can delay the first usable byte/sample.
Determinism is about the tail: use P99/P999 and maximum, not only average latency.
Jitter comes from contention: arbitration, memory/cache stalls, and priority inversion widen the latency distribution.
Watermarks are a hard knob: they define when the system “starts caring” and strongly shape first-byte latency.

Mechanism: batching increases “wait-to-fill” time

When transfers are triggered at a watermark or a fixed batch size, the system must wait until enough data accumulates before a DMA completion event can wake the consumer. This reduces interrupt frequency and improves continuity, but increases the time until the first usable bytes become available.

Small batch: lower first-byte latency, higher interrupt cadence, higher CPU overhead.
Large batch: higher throughput and lower CPU, but first-byte latency increases and jitter can widen under load.

The three latency components (use consistent tap points)

First-byte latency

Producer-valid → consumer can read the first usable byte/sample. Most sensitive to batching and watermark thresholds.

Last-byte latency

Producer start → entire frame/block completed in memory. Dominated by transfer size and bus/memory bandwidth.

Service latency

DMA-complete → consumer actually starts processing. Dominated by scheduling, locks, cache refill, and interrupt masking.

Why jitter widens (root-cause categories)

DMA arbitration & contention

Multi-channel DMA and bus-matrix contention introduce variable queueing time. Worst-case wait time sets jitter tail.

Cache refill / maintenance

Cache-line refills and coherency operations can be non-uniform under load. Separate “copy time” from “cache time”.

Priority inversion & scheduling

Long critical sections and lock contention delay the consumer even after DMA completes. This often dominates P99/P999.

Watermark policy

Higher watermarks reduce interrupts but delay wakeups; if combined with contention, tails become much wider.

Real-time rule: define a latency budget, then choose batch size

Set targets: P99 and max for end-to-end latency and service latency.
Bound batching: set a maximum batch size and watermark so first-byte latency stays under budget.
Control contention: assign DMA priority and limit burst length where worst-case wait time matters.
Prove the tail: collect latency histograms under worst-case load (not only idle lab conditions).

Pass criteria

P99 end-to-end latency ≤ X; max ≤ Y
P99 service latency ≤ A (consumer wake and start bounded)
First-byte latency ≤ B (batching bounded by watermark and max batch size)
Latency histograms remain within bounds under worst-case contention (DMA + memory + CPU load)

Scope guard

This section covers batching, arbitration, cache, and scheduling as sources of latency and jitter. Electrical edge quality is handled in SCLK Quality & Skew and Long-Trace SI.

Larger batches improve continuity but postpone the first usable bytes. A latency budget should bound watermark and max batch size.

DMA Building Blocks (Channels, Requests, Descriptors, Scatter-Gather)

DMA is a small execution engine: requests trigger transfers, channels compete for shared bandwidth, and descriptors define the copy plan (where, how much, and when to interrupt). Understanding these building blocks makes throughput tuning predictable and prevents “mystery jitter” from arbitration and buffer policy.

DMA as a pipeline (what each part controls)

Request (trigger)

Defines when DMA runs (e.g., FIFO watermark). Too frequent wastes CPU; too sparse increases first-byte latency.

Channel (resource owner)

A channel competes for shared bandwidth. Its priority and burst policy shape worst-case wait time (jitter tail).

Descriptor (transfer plan)

Specifies source, destination, length, and next pointer. The interrupt flag controls cadence and CPU load.

Linked list / scatter-gather

Chains descriptors to reduce re-arming gaps, enable ring buffers, and support zero-copy staging across multiple blocks.

Transfer modes (choose by continuity and determinism)

Single-shot

Best for fixed-size blocks and command-driven transfers.
Risk: soft gaps between blocks if re-arming is slow.

Cyclic (ring)

Best for continuous streams (high continuity, minimal soft gaps).
Requirement: robust head/tail management and overrun detection.

Scatter-gather (linked descriptors)

Best for zero-copy staging and long transfers without re-arming.
Risks: alignment/length limits, chain integrity, error handling strategy.

Descriptor design hard rules (avoid silent corruption)

Alignment: keep buffer base and length aligned to platform requirements (often cache-line and bus-burst aligned).
Length limits: never exceed the per-descriptor maximum; split into multiple descriptors instead of “hoping it wraps”.
Ring integrity: validate next pointers and wrap behavior; a broken chain becomes a “random freeze”.
Interrupt policy: too frequent interrupts collapse throughput; too sparse interrupts inflate first-byte and service latency.
Error strategy: define what happens on DMA error (log + reset channel + bounded retries + safe fallback).

Bring-up checklist

Verify the request source toggles (FIFO watermark) before DMA is enabled.
Verify one descriptor moves the expected byte count to the expected address.
Enable chain mode and confirm wrap returns to Desc0 without stopping.
Measure interrupt cadence and confirm it matches the intended watermark/batch policy.
Inject an error (timeout/abort) and confirm recovery does not deadlock the pipeline.

Descriptor chains reduce re-arming gaps and make transfer cadence explicit. The interrupt flag should match the intended watermark or frame boundary.

Buffering Strategies: Double/Triple Buffer, Ring Buffer, Watermark Tuning

Buffering turns bursty transfers into a steady stream that software can consume without overruns or underruns. The critical knob is the watermark: it creates margin between “data arrives faster” and “data is consumed slower,” shaping interrupt cadence, first-byte latency, and worst-case service time.

Key takeaways Watermark = cadence + margin

Double buffer favors determinism when the consumer can finish each block within one block period.
Triple buffer absorbs short consumer stalls but increases queueing and the latency upper bound.
Ring buffer maximizes continuity and is the foundation for frame slicing (see H2-6).
Watermark too low: high callback rate → CPU overhead → service jitter. Too high: wait-to-fill → first-byte delay.

Common buffering patterns (choose by continuity and deadline risk)

Double buffer (Ping-pong)

Best when: fixed blocks and bounded consumer time (tight deadlines).
Risk: switch gaps if re-arming is late; immediate overrun if the consumer slips.
Observe: P99 “DMA complete → consumer done” stays below one block period.

Triple buffer

Best when: occasional consumer stalls exist but average throughput is sufficient.
Trade-off: lower drop risk, higher queueing (latency upper bound grows).
Observe: buffer occupancy rarely hits high-watermark under worst-case load.

Ring buffer

Best when: continuous streams and minimal soft gaps are required.
Requirement: robust head/tail accounting, wrap handling, and overrun recovery.
Observe: fill-level distribution stays away from 0% and 100% for stability.

Watermark tuning (two safe operating goals)

Determinism-first

Use smaller batches and more frequent callbacks to bound first-byte latency.
Keep callback work minimal (short path): move pointers/markers, defer heavy parsing.
Set a max batch size so wait-to-fill cannot consume the latency budget.

Throughput-first

Use larger batches to reduce ISR frequency and improve continuity.
Ensure consumer capacity exceeds producer average rate; otherwise buffers only delay failure.
Prefer ring buffers and linked descriptors to minimize “re-arming gaps”.

Common pitfalls (symptom → first check → fix → pass)

Switch-gap “holes”

Symptom: sawtooth throughput and periodic idle gaps.
First check: re-arming latency and callback critical sections.
Fix: link descriptors or pre-arm the next buffer before the switch point.
Pass: measured “DMA complete → next start” gap ≤ X.

Consumer falls behind

Symptom: sporadic overruns under load.
First check: P99 service latency vs buffer headroom at the chosen watermark.
Fix: shorten the consumer critical path; split “fast pointer move” and “slow parse”.
Pass: occupancy stays below high-watermark with worst-case workload.

Overrun causes misalignment

Symptom: after one drop, parsing stays wrong.
First check: whether a resync policy exists after overrun.
Fix: mark the drop, enter resync (scan next marker/boundary), then resume framing.
Pass: recovery completes within Y ms and error counters stop rising.

Pass criteria (buffering is stable)

Fill-level stays within [L%, H%] under worst-case producer/consumer load (no sustained drift to 0% or 100%).
Overrun/underrun counters remain 0 (or are bounded with a defined resync recovery).
P99 service latency stays within the headroom implied by watermark and buffer depth.
Callback/ISR cadence matches the intended policy (no unexpected over-frequency).

Ping-pong buffering is simple and deterministic when consumer time fits within one block interval. Watermark sets the transfer cadence and headroom.

Framing & Bursts: Keeping Boundaries Without Killing Performance

Burst transfers maximize efficiency, but application data often has frame boundaries (messages, lines, packets). The core rule is burst ≠ frame. A robust design writes continuously into a ring buffer using DMA, then performs frame slicing (markers + pointers) without copying or blocking the DMA pipeline.

Key takeaways Slicing beats copying

Fixed length is DMA-friendly but needs resync after drops.
Length-prefixed supports variable frames; must bound length and handle corruption.
Delimiter/idle is easy to scan but needs escape/validation and recovery policy.
Frame slicing in a ring uses markers (start/len/end) and span descriptors for wrap-around.

Framing methods (choose by recovery strength, not just convenience)

Fixed length

Best for: constant-size frames and strict timing.
Risk: one drop shifts alignment; define resync points and maximum drift time.
Pass: bounded resync time after a forced drop.

Length-prefixed

Best for: variable frames with bounded maximum size.
Risk: corrupted length can “consume” the ring; enforce max length and validation.
Pass: invalid lengths are rejected and recovery is bounded.

Delimiter / idle

Best for: human-readable lines or simple message streams.
Risk: noise/drops create false boundaries; use escaping and validation.
Pass: bounded scan window and deterministic resync policy.

Ring slicing workflow (burst-friendly boundaries)

DMA writes continuously into a ring buffer (maximize continuity, minimize soft gaps).
Parser creates markers (start/len/end) without copying; markers reference ring offsets.
Frames may wrap; represent as two spans (tail span + head span) instead of memmove.
Partial frames keep state and wait for more data; never block DMA progress.
On error (bad length/CRC/delimiter), enter resync mode with bounded scan window and timeout.

Sticky failure modes (must be handled explicitly)

Half packet: frame start present, end missing → hold state and wait; do not emit false frame.
Sticky misalignment after drop: enter resync mode and bound recovery time.
False delimiter: require validation/escape and enforce maximum scan distance.
Corrupted length: clamp to a maximum and reject unreasonable values; avoid ring “runaway”.
Parser overload: keep parsing lightweight; heavy work should run after boundary is established.

Pass criteria (boundaries are preserved without killing throughput)

DMA continuity remains high (soft gaps bounded; no parser-induced stalls).
Frame extraction is zero-copy whenever possible (markers/spans, not memmove).
Resync completes within X ms and does not cause ring runaway.
Frame error counters stop rising after recovery; parser time stays bounded (P99).

Frame slicing keeps DMA continuity high while preserving boundaries. Wrap-around frames should be represented as two spans, not copied.

CPU Interaction: Interrupt Rate, Polling, Zero-Copy, Back-Pressure

High throughput fails when CPU involvement becomes the bottleneck. A stable DMA pipeline treats interrupts as control-plane events (pointer/marker updates), keeps heavy work in a consumer task, and uses back-pressure to prevent queue runaway when the consumer cannot keep up.

Key takeaways IRQ is expensive when frequent

Interrupt rate too high fragments CPU time → cache churn, scheduler overhead, and lock contention.
Zero-copy works only when ownership/lifetime is explicit; otherwise copy or copy-on-demand is safer.
Back-pressure is mandatory for overload: drop policy, reduced burst, or hardware flow control (covered in UART RTS/CTS page).
Producer/Consumer/Monitor is the stable pattern: DMA pushes, task consumes, stats closes the loop.

Interrupt rate & throughput collapse (control-plane vs data-plane)

Fixed per-interrupt cost: entry/exit, cache pollution, scheduler bookkeeping, and deferred work.
Failure mode: “more interrupts to be responsive” reduces effective data-plane time, causing soft gaps and jitter.
Rule: ISR should only move pointers/markers, update counters, and wake the consumer task.

Polling vs interrupt (a robust mixed strategy)

Interrupt (event-driven)

Best for: bounded first-byte latency and wake-up from idle.
Use it for: “data available” signal, pointer advance, marker creation, counter updates.
Avoid: parsing, copying, and long critical sections inside ISR.

Short-window polling (batch)

Best for: high throughput after a wake event (consume in chunks).
Use it for: bounded loops that drain N frames or M bytes, then yield.
Benefit: fewer context switches and better cache locality than per-chunk interrupts.

Zero-copy boundaries (ownership and lifetime decide)

Direct consume (safe zero-copy)

Consumer reads without modifying; buffer lifetime ends after processing.
Frames are referenced by markers/spans (ring offsets), not copied.
Cache coherency is enforced at the ownership switch (see H2-8).

Copy-on-demand (partial copy)

Copy only when a wrapped frame needs contiguity, or when upper layers cannot handle spans.
Keep copy scope minimal: edges only, not whole streams.
Preserve throughput by batching copies and avoiding per-byte operations.

Must copy (safety over speed)

Data must be retained long-term or modified in-place.
Buffer ownership is unclear or can be preempted by re-use.
Security/sandbox boundaries require isolated copies.

Back-pressure policies (prevent queue runaway)

Drop policy (controlled loss)

Choose drop-new, drop-old, or drop-to-boundary (resync-friendly).
Define trigger and recovery thresholds (hysteresis) to avoid oscillation.
Expose counters for monitoring and post-mortem.

Reduce load (graceful degradation)

Reduce burst size or watermark to cap worst-case service latency.
Lower source rate where possible (application-level throttling).
Prefer deterministic caps over “unbounded buffering”.

Flow control (hardware support)

Use hardware flow control when available (e.g., UART RTS/CTS) to stop the producer at the source.
This page only defines the policy; electrical/timing details belong in the UART flow control subpage.
Still keep drop/degrade as a failsafe for misbehaving sources.

Pass criteria (CPU stays out of the critical path)

Interrupt rate under full load stays ≤ X and does not cause throughput collapse.
ISR work remains bounded: pointer/marker updates only (no parsing, no long locks).
Queue occupancy remains within [L%, H%] with a defined back-pressure response.
Drop/degrade events are visible via counters and recovery is deterministic (no runaway).

A stable high-throughput design separates control-plane (ISR, pointers, counters) from data-plane work (batch consumer) and closes the loop with monitored back-pressure.

Memory System Pitfalls: Cache Coherency, Alignment, DMA-Safe Regions

Many “protocol-looking” failures are memory-system failures: cache lines, alignment, and DMA-safe regions. DMA writes RAM directly while CPUs often read and write through caches. Without correct maintenance at ownership boundaries, software may see stale, repeated, or scrambled data.

Key takeaways Direction matters

Device → Memory (DMA writes): CPU must not read stale cache lines (invalidate at handoff).
Memory → Device (DMA reads): RAM must contain latest CPU writes (flush/clean at handoff).
Cache-line granularity: maintenance ranges must be expanded to line boundaries.
Alignment: start/length/boundary alignment reduces edge-case failures and jitter.

Device → Memory (DMA writes, CPU reads)

Risk: CPU keeps old cache lines while DMA updates RAM.
Rule: before CPU consumes DMA output, invalidate the covered cache-line range.
Scope: expand start/end to cache-line boundaries to avoid “edge bytes” being stale.

Memory → Device (CPU writes, DMA reads)

Risk: CPU updates cache but RAM still contains older data.
Rule: before DMA reads CPU-prepared buffers, flush/clean cache lines to RAM.
Scope: align to cache-line boundaries; avoid sharing cache lines with unrelated data.

Alignment & DMA-safe regions (prevent “works on bench, fails under load”)

Start alignment: align buffer start to cache-line and DMA burst-friendly boundaries.
Length alignment: pad lengths to avoid partial-line maintenance and boundary crossings.
Boundary rules: avoid crossing forbidden windows (platform bus/bridge limitations).
DMA-safe memory: prefer regions defined for DMA (coherent or explicitly managed); do not assume all RAM is equivalent.
Protection hints: MPU/IOMMU mapping errors can appear as silent corruption or hard faults; validate early in bring-up.

Symptoms → first checks → fix → pass

Stale / repeated data

Quick check: verify direction and cache maintenance at ownership handoff.
Fix: invalidate before CPU read (device→mem) or flush before DMA read (mem→device).
Pass: repeated patterns disappear under worst-case load and long runs.

Misalignment / offset shifts

Quick check: buffer start/length alignment and cache-line boundary expansion.
Fix: align start/size; avoid sharing cache lines; pad to boundaries.
Pass: no sporadic “one-byte shift” or partial-line artifacts.

Random corruption (rare)

Quick check: DMA-safe region constraints and forbidden boundary crossings.
Fix: move buffers to DMA-approved regions; enforce address window rules; validate MPU/IOMMU mapping.
Pass: corruption rate drops to 0 over extended soak tests.

Pass criteria (memory interactions are deterministic)

Correct coherency operations occur at every ownership switch (direction-aware).
Maintenance ranges are cache-line aligned; buffers do not share cache lines with unrelated data.
Start/length alignment rules are enforced and verified during bring-up.
No stale/repeated/scrambled symptoms during long-run soak under worst-case contention.

Coherency must be enforced at the ownership switch. Always expand maintenance ranges to cache-line boundaries and keep DMA buffers aligned.

Bus-Specific Patterns (SPI / UART / I²C): What Changes, What Stays

DMA patterns share the same backbone across buses: FIFO → DMA → RAM buffer → consumer → stats. What changes is the boundary model (CS vs idle vs transactions) and the way overload is handled. Only DMA-relevant differences are covered here; bus electrical/timing details belong to the dedicated SPI/UART/I²C pages.

What stays Same DMA backbone

Trigger model: which events generate DMA requests (FIFO watermark, RX ready, TX empty).
Boundary model: how frames are marked (CS, IDLE/BREAK, or transaction limits).
Overload model: what happens when the consumer is late (drop/degrade/flow-control).
Metrics: sustained throughput, max latency, jitter (P99), and IRQ rate / CPU time.

SPI (DMA-specific)

Full-duplex coupling: RX/TX progress often must stay synchronized; dummy bytes may clock data out.
Boundary anchor: CS acts as a hard boundary; define whether a DMA burst must stay within one CS window.
Off-page pointer: SCLK quality and mode settings affect sampling, but are not expanded here.

First checks

Verify RX and TX DMA descriptors advance in lockstep when required.
Confirm CS boundary policy matches the transaction framing expected by the consumer.

UART (DMA-specific)

Continuous byte stream: boundaries are not inherent; DMA is best treated as a continuous ring writer.
Boundary trigger: IDLE gaps or BREAK events can generate markers for framing and resync.
Overload priority: back-pressure and drop-to-boundary policies prevent runaway when the consumer stalls.
Off-page pointer: baud error and RTS/CTS details belong to UART subpages.

First checks

Validate IDLE/BREAK markers are created at the correct handoff points.
Verify ring occupancy has hysteresis and back-pressure triggers before overflow.

I²C (DMA-specific)

Short transactions: per-transaction overhead is large; DMA is mainly for CPU offload and jitter reduction.
Bursts are bounded: DMA descriptors typically map to short byte blocks per transaction, not long streams.
Off-page pointer: clock stretching and pull-up constraints drive worst-case time, but are not expanded here.

First checks

Confirm DMA is reducing CPU touch points rather than chasing theoretical peak throughput.
Ensure transaction boundaries are preserved and error handling remains deterministic.

Cross-bus checklist (what must be re-confirmed when switching buses)

DMA request source (RX/TX/FIFO watermark) and the chosen completion cadence.
Boundary anchors (CS / IDLE-BREAK / transaction) and marker semantics in the buffer.
Maximum burst size and queue depth caps to keep worst-case latency bounded.
Back-pressure strategy (drop/degrade/flow-control) with trigger and recovery thresholds.
Cache coherency and alignment rules at ownership handoffs (see H2-8).
Observed metrics: sustained throughput, max latency, P99 jitter, IRQ rate.

SPI often couples TX/RX progress and uses CS as a boundary. UART is a continuous stream where IDLE/BREAK help create markers. I²C uses short bounded bursts where DMA primarily offloads CPU work.

Real-Time Playbook: Bounding Worst-Case Latency

Real-time is not “fast on average”. It is bounded worst-case service time. Start from a deadline, decompose end-to-end latency into bounded parts, then apply hard caps that hold under contention.

Definition (deadline vs worst-case service time)

Deadline: the maximum time allowed from “byte arrives” to “byte is consumed”.
Worst-case service time: the maximum observed under worst contention, not the average.
Pass condition: max latency < X with margin.

Worst-case decomposition (sum of bounded parts)

DMA wait: arbitration and priority queueing time.
Transfer: bus time + memory bandwidth under contention.
ISR/notify: completion signaling and wake latency (avoid long masking).
Consume: bounded batch work in the consumer task (no unbounded loops).

1) DMA priority & arbitration

Amplifier: long queueing behind bulk channels.
Control: fixed priority for the critical channel and a capped burst length.

2) Bus contention

Amplifier: competing DMA/CPU traffic stretches transfer time.
Control: schedule bulk work outside critical windows; cap queue depth.

3) IRQ masking & long critical sections

Amplifier: completion and wake-up are delayed unpredictably.
Control: bound masking time; move work out of ISR; shorten lock holds.

4) Cache refill & memory effects

Amplifier: unpredictable cache misses and coherence maintenance costs.
Control: aligned buffers, fixed working sets, and no dynamic allocation on the critical path.

Practical rules (turn worst-case into hard caps)

Fixed max burst length: cap transfer and completion intervals.
Fixed max queue depth: cap queueing delay; avoid “unbounded buffering”.
No dynamic memory: critical path uses pre-allocated buffers and fixed markers.
Watermark + watchdog: detect stuck pipelines and force deterministic recovery.
Back-pressure integration: when near deadline, prefer degrade/drop-to-boundary over collapse.

Pass criteria (examples; use project thresholds)

Max latency: < X
Jitter: < Y
Drop rate: < Z (or only under a defined overload policy)
Recovery: watchdog triggers within T and returns to stable operation

Build a worst-case budget and enforce caps (burst length, queue depth) so the summed latency remains below the deadline under contention.

Debug & Validation: Prove Throughput, Prove Lossless, Prove Boundaries

A stable DMA pipeline is validated with three evidence chains: (1) throughput (sustained/peak), (2) loss/reorder (sequence/CRC/counters), and (3) boundaries (markers stay consistent across FIFO → DMA → consumer → app). Use one time base and fixed observation points to turn “it feels slow” into measurable proof.

1) Throughput (sustained vs peak)

Peak: short-window burst limit (bus time + FIFO + DMA burst).
Sustained: long-window payload rate (reveals software gaps and back-pressure).
Must log: payload bytes / fixed window (e.g., 1 s) + IRQ rate.

2) Loss / reorder (prove “lossless”)

Sequence number: detect gaps, repeats, and out-of-order consumption.
CRC / checksum: detect corruption and buffer misalignment.
Counters: FIFO overrun/underrun, DMA error flags, and drop-policy triggers.

3) Boundaries (prove markers)

Markers: boundary metadata written at the producer side must match what the consumer slices.
Monotonicity: write pointer, read pointer, and marker indices must never “go backward”.
Resync path: when overflow happens, recovery must drop-to-boundary and restart deterministically.

Recommended instrumentation (minimum set)

Ring stats: bytes moved, IRQ count, occupancy histogram, max occupancy, and drop counters.
Alarms: watermark high/low, overrun/underrun, and watchdog timeout events.
DMA flags: bus error, address error, FIFO error, descriptor error (if supported).
Unified timestamps: TS0/TS1/TS2/TS3 use one time base (avoid mixed clocks).

Concrete debug tools (examples; verify model/options)

Logic/protocol analysis: Saleae Logic Pro 16, Total Phase Beagle I2C/SPI Protocol Analyzer.
Embedded trace / profiling: SEGGER J-Link Plus (or J-Link Ultra+), Arm ULINKpro.
High-speed oscilloscope (edge sanity): Tektronix MDO3 series (model per bandwidth), Keysight InfiniiVision series (model per bandwidth).

Note: these tool examples support the validation workflow; use project requirements to select bandwidth/options.

Typical failure tree (symptom → first checks)

Low sustained throughput: check software gaps (DMA re-arm latency), IRQ collapse (too frequent completions), and back-pressure triggers.
High CPU while “DMA is on”: check ISR payload, polling loops, cache maintenance overhead, and lock contention.
Occasional gaps / duplicates: check sequence counters, ring overwrite/overrun counters, and marker monotonicity.
Only fails at high load: check DMA arbitration/priority, queue depth caps, and IRQ masking/critical sections.
Looks like corruption: check cache coherency rules, alignment to cache-line, and DMA-safe memory regions.

Pass criteria (examples; fill with project thresholds)

Sustained payload: ≥ X (window fixed and documented)
Loss/reorder: sequence gaps = 0, CRC failures = 0 (or explicitly bounded and logged)
Boundary errors: marker/frameslice errors = 0; resync completes within T
Latency/jitter: max latency < A, P99 jitter < B (single time base)

Use fixed observation points and one time base. Throughput uses bytes per window; loss uses SEQ/CRC and overrun counters; boundary proof relies on marker monotonicity.

Applications & IC Selection Notes (DMA-Friendly Peripherals)

“DMA-friendly” is a datasheet-visible feature set: descriptor modes (SG/cyclic), FIFO depth & watermarks, multi-channel arbitration, flexible request mapping, and strong error reporting/coherency guidance. The goal is high sustained throughput with bounded worst-case latency and predictable recovery.

Key DMA-related dimensions

Descriptor modes: scatter-gather, linked-list, cyclic, half-transfer events.
FIFO depth & watermarks: independent RX/TX watermarks, overrun flags, and programmable thresholds.
Arbitration control: per-channel priority/weighting, burst caps, and interconnect QoS (if available).
Request mapping: flexible peripheral request routing (avoid “fixed channel bottlenecks”).
Debuggability: explicit error causes (bus/address/FIFO), counters, and coherency notes.

Where to look in the datasheet/manual

DMA chapter: descriptor support, interrupt modes, cyclic/SG, error flags.
Interconnect / bus matrix: arbitration, bandwidth notes, QoS/priority.
Cache / coherency notes: DMA-safe regions, alignment requirements, maintenance rules.
Peripheral FIFO section: depth, watermarks, overrun/underrun behavior.

Concrete part-number examples (verify package/suffix/availability)

MCUs / application processors with strong DMA ecosystems

ST: STM32H743ZI, STM32H723VG (high-performance DMA + cache considerations).
NXP: MIMXRT1062DVL6A, MIMXRT1176DVMAA (strong peripheral DMA + high throughput IO).
Microchip: ATSAME70Q21B, ATSAMV71Q21B (DMA-centric peripheral set; check cache/coherency notes per design).
Renesas: R7FA6M5BH2CB (RA6M5 family example; verify the exact part variant for memory/peripheral mix).
TI: TM4C1294NCPDT (uDMA-style flows; verify FIFO depth and interrupt modes per peripheral).

Selection tip: prioritize parts with explicit DMA error reporting and coherent-memory guidance in the reference manual.

DMA-friendly bridge / expander ICs (help reduce CPU touch points)

I²C/SPI-to-UART with FIFOs: NXP SC16IS752, SC16IS750 (use FIFO + interrupt/watermark to reduce CPU service rate).
I²C-to-SPI bridge: NXP SC18IS602B (useful for turning short I²C transactions into buffered SPI accesses).
I²C channel mux (address/fanout management): TI TCA9548A (helps segmentation and reduces “bus-wide” recovery impact).
USB-to-serial (buffered links for host-side throughput): FTDI FT232H, Silicon Labs CP2102N (exact suffix depends on package/temperature).

These parts do not replace system DMA, but can shift boundary handling and buffering away from the CPU.

Common high-throughput DMA endpoints (SPI/QSPI/OSPI memories)

Winbond: W25Q128JV, W25Q256JV (verify package and speed grade).
Macronix: MX25L25645G (verify package/temperature suffix).
Micron: MT25QU256ABA (verify exact density/IO mode support per variant).

Memory endpoints highlight DMA requirements: long bursts, boundary control, and consistent cache/coherency handling.

Minimal selection checklist (quick self-check)

Supports scatter-gather or linked descriptors (or equivalent chaining).
Supports cyclic mode (or stable ring-buffer writer behavior).
Provides FIFO watermarks and explicit overrun/underrun flags.
Provides priority/arbitration control and (ideally) burst caps for worst-case bounds.
Provides request mapping flexibility (avoid fixed bottlenecks).
Provides clear coherency guidance (DMA-safe memory, alignment, maintenance rules).
Provides actionable error reporting (bus/address/FIFO/descriptor errors).

Prefer parts with descriptor chaining, deep FIFOs with programmable watermarks, controllable arbitration, flexible request mapping, and explicit coherency/error guidance.

Request a Quote

Name

Company

Part Number(s) / BOM

Quantity & Target Lead Time

Alternates Allowed

Temperature Grade

Package / Footprint

Compliance

Budget Window

Lot Size / Qty

Message

Attachment

Accepted Formats

pdf, csv, xls, xlsx, zip

Attachment

Drag & drop files here or use the button below.

FAQs: DMA High Throughput (Troubleshooting Closure)

These FAQs close long-tail debug questions without expanding the main text. Each answer follows the fixed 4-line structure: Likely cause / Quick check / Fix / Pass criteria.

Throughput is high but CPU is also high — IRQ too frequent or cache thrash?

Likely cause: Completion IRQ rate is too high and/or cache maintenance is triggered excessively, causing scheduling and cache churn.
Quick check: Log IRQ rate and DMA re-arm gap per 1 s window; compare CPU time in ISR vs consumer; check cache maintenance time spikes.
Fix: Increase batch size (or enable half-transfer events), move heavy work out of ISR, and reduce cache ops by using DMA-safe regions and cache-line-aligned buffers.
Pass criteria: IRQ rate < X, sustained payload ≥ Y, CPU(data-move) < Z%, no sustained queue backlog.

After enabling DMA, occasional “old data” appears — flush or invalidate first?

Likely cause: Cache coherency maintenance is incorrect for the transfer direction, so CPU reads stale cache lines or DMA reads stale RAM.
Quick check: Determine direction: DMA writes RAM → CPU reads (invalidate before CPU use) vs CPU writes RAM → DMA reads (flush/clean before DMA). Verify maintenance range expands to cache-line boundaries.
Fix: Apply correct invalidate/flush at ownership handoff points (DMA done / buffer handoff) and enforce cache-line-aligned start/length for buffers.
Pass criteria: CRC fails = 0, repeat/ghost samples = 0 across stress; coherence actions are deterministic and documented.

Larger bursts increase jitter — check watermark, priority, or arbitration first?

Likely cause: Larger bursts reduce IRQ overhead but increase worst-case service time due to DMA/bus arbitration delays and consumer wake-up granularity.
Quick check: Compare max latency and P99 jitter before/after increasing burst; log queue occupancy peaks and DMA wait time (if available).
Fix: Cap maximum burst length, tune watermark to trigger earlier service, and raise priority for the real-time DMA channel (or isolate it from bulk traffic).
Pass criteria: max latency < X and P99 jitter < Y under worst-case load; no sustained backlog at high watermark.

Frame boundaries become corrupted — ring slicing or delimiter strategy issue?

Likely cause: Boundary metadata (markers) is not consistent with buffer ownership, or delimiter-based framing loses sync after a gap/overflow.
Quick check: Validate marker monotonicity (write ptr/read ptr/index never goes backward); check SEQ/CRC across boundary crossings and wrap points; confirm overflow recovery does drop-to-boundary.
Fix: Prefer explicit length/marker slicing on the ring buffer; for delimiter schemes, add resync logic that searches next valid boundary after overflow or missing bytes.
Pass criteria: boundary errors = 0, SEQ gaps = 0 (or bounded with recovery), wrap-around never produces malformed frames.

RX occasional overrun — check FIFO watermark or consumer blocking first?

Likely cause: Consumer service cannot keep up with burst arrival, or watermark is set too late to absorb scheduling/arbitration delays.
Quick check: Inspect FIFO_OVR and ring occupancy peaks; correlate overrun timestamps with consumer blocked time (locks/critical sections/IRQ masking).
Fix: Lower the watermark (earlier wake-up), bound consumer critical sections, and cap burst length; add back-pressure or drop-to-boundary policy before hard overrun occurs.
Pass criteria: FIFO_OVR = 0 under stress; occupancy stays below high watermark with margin; recovery never corrupts boundaries.

SPI full-duplex reads 0xFF/0x00 — dummy/turnaround or CS timing? (DMA-focused)

Likely cause: RX/TX progress is not synchronized (insufficient dummy bytes or wrong pairing), or CS is not held across the intended DMA burst boundary.
Quick check: Compare TX byte count vs RX byte count per burst; verify descriptor boundaries match CS hold policy; confirm first-byte content is discarded when dummy clocks are expected.
Fix: Ensure paired TX/RX descriptors (or coupled lengths), insert explicit dummy phase when required, and keep CS asserted for the full transaction span (avoid unintended CS toggles between descriptors).
Pass criteria: device ID reads match expected value across N trials; SEQ/CRC stable; no “all 0xFF/0x00” bursts under load.

UART high traffic shows occasional framing errors — sampling noise or buffer “holes”?

Likely cause: Consumer/ISR service creates buffer holes (gaps) so bytes are dropped, which looks like framing errors at the parser level under high load.
Quick check: Correlate framing-error events with FIFO_OVR, ring occupancy spikes, and DMA re-arm gaps; verify IDLE/BREAK markers align with buffer slices (not mid-gap).
Fix: Lower watermark for earlier service, cap burst/queue depth, and add drop-to-boundary resync when overflow happens; keep UART framing logic independent from DMA chunk size.
Pass criteria: overrun counters = 0 (or bounded with defined recovery), parser resync time < T, sustained traffic shows no growing SEQ gaps.

I²C with DMA becomes slower — transaction overhead vs batching, how to decide?

Likely cause: I²C performance is dominated by per-transaction overhead and gaps; DMA reduces CPU work but may not improve wire-time efficiency for short transfers.
Quick check: Measure payload efficiency: payload time vs overhead+gap time; compare CPU load reduction vs end-to-end latency change when switching to DMA.
Fix: Use DMA mainly for offload (lower CPU/jitter), combine small reads/writes when protocol allows, and avoid overly large DMA batches that increase first-byte latency without increasing wire efficiency.
Pass criteria: CPU load drops by ≥ X% while throughput/latency remain within targets; no increase in timeout/retry events.

Only fails at high load — first step for memory bandwidth / bus contention?

Likely cause: Under load, arbitration delays and memory contention expand worst-case service time, exposing hidden timing margins.
Quick check: Run A/B test: disable one bulk DMA stream and observe if jitter/overrun disappears; log DMA wait indicators (or latency segments TS0→TS1, TS1→TS2) to find which segment inflates.
Fix: Raise priority for the real-time channel, cap burst length, limit queue depth, and schedule bulk transfers outside the critical window (or isolate via QoS if supported).
Pass criteria: worst-case latency remains bounded (< X) with all background traffic enabled; no high-watermark saturation.

No data loss, but latency spikes — adjust queue depth or back-pressure strategy?

Likely cause: Queue depth is too large (hidden buffering), or back-pressure triggers too late, so service delay accumulates even without dropping bytes.
Quick check: Monitor queue occupancy histogram and max occupancy; compare P99 latency with different queue depth caps; check if back-pressure events occur only after near-saturation.
Fix: Reduce maximum queue depth, lower watermark for earlier wake-up, and implement explicit back-pressure (drop/degrade/flow-control) before saturation; keep recovery deterministic (drop-to-boundary when needed).
Pass criteria: max latency < X and P99 jitter < Y; queue occupancy stays below cap; drop rate (if enabled) < Z and bounded.

Periodic system “hang” — how to design DMA error flags and watchdog?

Likely cause: DMA enters an error state (bus/address/descriptor) or the consumer stops advancing pointers, causing silent deadlock without visible loss counters.
Quick check: Log DMA_ERR flags and last TS progression (TS0/TS1/TS2/TS3). If TS stops advancing while input continues, it is a service deadlock; if DMA_ERR asserts, it is a transfer fault.
Fix: Implement watchdog on pointer progress and watermark; on fault, stop DMA, reset descriptors/ring indices, drop-to-boundary, and restart with a recorded fault code for postmortem.
Pass criteria: recovery completes within T, no repeated fault storm, fault code is logged with TS snapshot and counters.

ATE/production passes, but field drops occur — which stats/alarm fields are usually missing?

Likely cause: Production tests validate functionality but do not log long-window sustained metrics, worst-case latency, and near-overrun precursors under realistic background load.
Quick check: Verify firmware logs include: bytes/window, IRQ rate, max occupancy, FIFO_OVR, DMA_ERR, drop events, and TS-based latency segments; compare ATE vs field log coverage.
Fix: Add ring stats + watermark alarms + fault snapshots (TS + counters). Include stress profiles (background DMA, cache pressure) in validation to expose contention-driven worst-case.
Pass criteria: field logs can reproduce root cause within one trace; pre-fault indicators (occupancy, IRQ, DMA wait) are captured before any drop event.

DMA & High Throughput on SPI/I2C/UART: Bursts, Framing, Latency

DMA & High Throughput on SPI/I2C/UART: Bursts, Framing, Latency

What “High Throughput” Really Means on Peripheral Buses

Throughput Budget Model: Bits on the Wire vs Useful Payload

Latency & Determinism: Why DMA Can Make Latency Worse

DMA Building Blocks (Channels, Requests, Descriptors, Scatter-Gather)

Buffering Strategies: Double/Triple Buffer, Ring Buffer, Watermark Tuning

Framing & Bursts: Keeping Boundaries Without Killing Performance

CPU Interaction: Interrupt Rate, Polling, Zero-Copy, Back-Pressure

Memory System Pitfalls: Cache Coherency, Alignment, DMA-Safe Regions

Bus-Specific Patterns (SPI / UART / I²C): What Changes, What Stays

Real-Time Playbook: Bounding Worst-Case Latency

Debug & Validation: Prove Throughput, Prove Lossless, Prove Boundaries

Applications & IC Selection Notes (DMA-Friendly Peripherals)

Request a Quote

Accepted Formats

Attachment

FAQs: DMA High Throughput (Troubleshooting Closure)

Explore

Categories

Get in Touch

DMA & High Throughput on SPI/I2C/UART: Bursts, Framing, Latency

DMA & High Throughput on SPI/I2C/UART: Bursts, Framing, Latency

What “High Throughput” Really Means on Peripheral Buses

Throughput Budget Model: Bits on the Wire vs Useful Payload

Latency & Determinism: Why DMA Can Make Latency Worse

DMA Building Blocks (Channels, Requests, Descriptors, Scatter-Gather)

Buffering Strategies: Double/Triple Buffer, Ring Buffer, Watermark Tuning

Framing & Bursts: Keeping Boundaries Without Killing Performance

CPU Interaction: Interrupt Rate, Polling, Zero-Copy, Back-Pressure

Memory System Pitfalls: Cache Coherency, Alignment, DMA-Safe Regions

Bus-Specific Patterns (SPI / UART / I²C): What Changes, What Stays

Real-Time Playbook: Bounding Worst-Case Latency

Debug & Validation: Prove Throughput, Prove Lossless, Prove Boundaries

Applications & IC Selection Notes (DMA-Friendly Peripherals)

Recommended topics you might also need

Request a Quote

Accepted Formats

Attachment

FAQs: DMA High Throughput (Troubleshooting Closure)

Explore

Categories

Get in Touch