Stereo Vision Module: Dual-Camera Sync & Depth Engine
← Back to: Imaging / Camera / Machine Vision
H2-1. What a Stereo Vision Module Owns (Definition & Boundary)
A stereo vision module is an engineered depth pipeline that turns two time-aligned images from a known baseline into disparity and depth using rectification and a disparity engine. Reliable depth is not “just an algorithm”: it is a contract across sync, geometry, and a measurable depth error budget.
- Two cameras + known baseline
- Deterministic pairing (frame ID + timestamps) + exposure alignment
- Rectification (intrinsics/extrinsics + distortion model) to enforce epipolar constraint
- Disparity → Depth with confidence/validity outputs
- Inputs: Left/Right frames, frame ID (or sequence counter), and timestamps with clear semantics
- Outputs: disparity map, depth map (or point cloud), and confidence/validity mask
- Diagnostics: pairing mismatch counters, timestamp-delta stats, epipolar residual stats
- Sync pillar — frames represent the same instant. Evidence: trigger/FSIN timing, timestamp delta histogram (mean + p99), and frame-ID mismatch rate.
- Geometry pillar — rectification makes correspondence 1D. Evidence: reprojection error, epipolar residual, and rectified row-alignment residual.
- Error budget pillar — small disparity errors do not explode into unacceptable depth errors. Evidence: depth RMSE/MAE binned by distance + hole rate + edge noise.
H2-2. System Architecture & Dataflow (From Photons to Depth)
Stereo depth is a chain of stages where each stage contributes measurable latency, jitter, and error. The practical goal is not “maximum frame rate”, but deterministic pairing plus stable geometry so that disparity errors remain inside the target distance error budget.
- Capture left/right exposure + readout (must be pairable by frame ID + timestamp).
- Align frame pairing and exposure consistency checks (drop or mark mismatches).
- Rectify undistort + warp to epipolar-aligned images (correspondence becomes 1D).
- Match compute matching cost + aggregate cost (window / path / census variants).
- Select choose disparity + sub-pixel refinement + left-right consistency check.
- Depth convert disparity to depth (plus confidence / validity mask).
- Filter fill holes / edge-aware smoothing / temporal stability (without hiding faults).
- Output package depth/disparity/confidence + diagnostic counters.
- Line buffer: supports rectification and matching windows without stalling the capture stream.
- Frame buffer (DDR): needed when full-frame warps, large disparity ranges, or multi-path aggregation exceed on-chip SRAM.
- Output queue: decouples compute bursts from host/network consumption to prevent drops.
- Capture/Align: frame-ID mismatch rate, timestamp-delta histogram (mean/p99).
- Rectify: epipolar residual and row-alignment residual (after warp).
- Match/Select: hole rate, left-right consistency fail rate, edge noise.
- Depth/Output: depth MAE/RMSE vs distance bins, p99 latency, drop rate.
H2-3. Dual-Sensor Synchronization (Trigger, Exposure Alignment, Rolling Effects)
Stereo depth fails first at time alignment. Synchronization must be treated as a verifiable contract: pairing correctness (same event), exposure alignment (same instant), and stable behavior under motion. “Having a trigger pin” is not sufficient unless timing evidence is logged and passes acceptance checks.
- Frame sync — Left and Right frames belong to the same trigger/event. Evidence frame counter / trigger count difference = 0, mismatch rate ≈ 0.
- Exposure alignment — integration windows overlap tightly enough to prevent motion-induced disparity errors. Evidence timestamp delta distribution (mean + p99) and exposure-start/stop phase.
- Line/readout alignment (rolling effects, keep brief) — when rolling shutter is involved, per-line sampling time offsets can create shear/edge tearing in dynamic scenes. Evidence motion-direction-dependent tearing and line-phase misalignment if observable.
- External trigger / FSIN: deterministic pairing; verify FSIN→VS phase stability over time/temperature.
- Shared clock: reduces long-term drift, but still requires exposure/event alignment verification.
- Sync GPIO: implementation-simple but often higher p99 jitter due to control-path variability.
- Software-only pairing: acceptable as a safety net (drop/repair), not as the primary sync strategy.
- Waveforms: FSIN/trigger + both VS (and HS/line-valid if available) → check relative phase and drift.
- Frame counters: left_count − right_count over time → detect slips, drops, resets.
- Timestamp Δt histogram: mean (systematic offset) + width/p99 (jitter) + multimodal peaks (queue/re-pair faults).
- Depth “randomly noisy” even in static scenes → frame mismatch counters non-zero → suspect pairing/trigger integrity.
- Only moving edges tear or “zipper” → Δt p99 is large or drifting → suspect exposure misalignment / rolling effects.
- Sudden depth jumps → Δt histogram shows long tail or two peaks → suspect buffering/re-order or intermittent drops.
H2-4. Hardware Timestamps (What They Mean, Where They Come From, How to Validate)
A timestamp is only useful when its meaning and tap point are explicit. In stereo, the most important question is: does the timestamp represent exposure time or merely arrival time after transport and buffering? This section turns timestamps into testable engineering quantities.
- Monotonic: no backward jumps or duplicates within a stream; otherwise correlation breaks.
- Pairable: left/right can be aligned to the same event using frame ID + timestamp.
- Low jitter: Δt distribution is tight and stable; p99 is the gating metric for motion scenes.
- Traceable to exposure: a demonstrable mapping exists between timestamp and exposure event.
- Sensor tap (closest to physics): often best for exposure traceability; may include fixed pipeline delay.
- SerDes/bridge tap: typically “transport arrival”; affected by serialization/CDR and link buffering.
- FPGA/SoC tap: easy to unify, but vulnerable to DDR/queueing and arbitration jitter.
- Host tap: useful for end-to-end monitoring; weakest for exposure truth due to OS/bus scheduling.
- Arrival ≠ exposure: buffering smooths arrival time while exposure remains misaligned under motion.
- Systematic offset: multi-stage pipelines add fixed delay; offset must be known or measured.
- Re-ordering: queues can cause multimodal Δt histograms (two peaks) and occasional frame inversion.
- Event mark: introduce a short optical event (flash/LED strobe) visible to both cameras within the FOV. The only requirement is a sharp intensity change.
- Image-side detect: find which frame (and optionally which rows) contains the event in Left and Right streams.
- Compare: image-derived event alignment vs timestamp Δt statistics. If timestamps indicate tight alignment but image events do not, timestamp semantics are incorrect (arrival-tagged or heavily buffered).
- Log fields (recommended): frame_id, timestamp, tap_id, exposure_start/end (if available), queue depth, drop/mismatch counters.
H2-5. Baseline & Calibration Workflow (Intrinsics/Extrinsics → Rectification)
Calibration is the geometry pillar of stereo stability. The deliverable is not “a set of parameters,” but a rectified pair with measurable epipolar alignment. A strong calibration workflow is a repeatable SOP: capture → solve → validate → ship rectification maps, while keeping storage/versioning outside this page scope.
- Baseline (B) is the relative camera center separation (magnitude + direction). It directly sets depth sensitivity.
- Small baseline improves packaging but reduces far-range depth stiffness (depth becomes “soft/noisy” at distance).
- Baseline error/drift appears as systematic depth scale error; mechanical shift or temperature drift can break rectification.
- Practical rule: baseline suitability must be judged against the target distance band and the allowable depth error budget, not by a single “standard” value.
- K (intrinsics): focal lengths + principal point
- D (distortion model): radial/tangential terms (model-dependent)
- R, T (extrinsics): relative rotation/translation (baseline direction and pose)
- Rectification maps: warp/undistort maps for Left/Right images (generation + validation only)
- Reprojection error: feature/board corner fit stability (baseline health check)
- Epipolar error: after rectification, corresponding points should lie on the same scanline
- Rectified alignment residual: row-alignment residual statistics on rectified images
- FOV coverage: board appears in center + all corners/edges; avoid only-front-and-center datasets.
- Distance distribution: include near/mid/far samples that cover the intended operating depth band.
- Pose diversity: add tilt/rotation views to constrain intrinsics/extrinsics robustly.
- Lighting consistency: avoid flicker, glare, and overexposure; stable corners enable stable geometry.
- Sync cleanliness: capture while pairing/sync is “healthy”; otherwise the dataset bakes in time skew.
- Detect features: board corners/features in both images; reject frames with poor detection confidence.
- Optimize: solve K/D for each camera and R/T between cameras; enforce reasonable priors if needed.
- Rectify: compute rectification transforms and generate rectification maps for Left/Right.
- Validate: compute reprojection, epipolar, and alignment residual metrics; gate the calibration result.
- Deliver: ship K/D/R/T + rectification maps and the validation report summary (storage not covered here).
H2-6. Disparity Engines (Algorithms, Hardware Acceleration, Confidence)
Disparity is an engineering choice, not a buzzword. A disparity engine must be selected and tuned based on quality, compute/bandwidth, and deterministic latency. A production-grade pipeline also requires explicit confidence/invalid outputs; otherwise field debugging becomes blind.
- Block Matching (BM) — low latency, hardware-friendly; weaker in low texture, repetitive patterns, and specular surfaces.
- Robust cost (e.g., Census-like) — more stable under illumination mismatch; higher compute and memory traffic.
- SGM-style aggregation — improved structure and fewer holes; higher bandwidth/latency cost, especially with large search ranges.
- Search range: larger range increases compute/bandwidth roughly linearly; too small causes near-range “cliff” errors.
- Support/window size: larger windows stabilize texture but blur edges and thin structures (edge bleeding risk).
- Sub-pixel refinement: improves precision but can increase sensitivity to noise; must be validated by depth bins.
- Left-right check: rejects occlusion/false matches; increases invalid pixels (trade safety vs density).
- Occlusion: LR inconsistency → mark invalid instead of producing confident wrong depth.
- Low texture: flat cost curve → low confidence; avoid “random” depth speckle.
- Specular/reflective: illumination mismatch → robust cost helps; still require confidence gating.
- Repetitive patterns: multiple minima → low confidence or multi-peak warning.
- Cost compute is typically parallelizable (vector/FPGA/NPU); it scales with resolution and search range.
- Aggregation is often the bandwidth bottleneck (line buffers + DDR pressure), impacting p99 latency.
- Select/refine is lighter but must preserve determinism and confidence semantics.
H2-7. Depth Error Budget (How Small Disparity Errors Become Big Depth Errors)
Depth is inversely related to disparity. When disparity becomes small (far range), even a tiny disparity error can inflate into a large depth error. A usable stereo module therefore needs a measurable error budget that ties together time, geometry, and matching into one acceptance story.
- Depth Z depends on baseline B, effective focal scale f, and disparity d (often written as Z ≈ B·f / d).
- Near range: disparity is larger → the same Δd causes a smaller ΔZ (depth feels “stiffer”).
- Far range: disparity becomes small → the same Δd causes a much larger ΔZ (depth becomes “hypersensitive”).
- Time: frame/exposure misalignment (sync/timestamp semantics)
- Geometry: calibration error, epipolar misalignment, extrinsic drift
- Matching: pixel noise, sub-pixel fit error, low-texture / specular / repetitive patterns
- Time evidence: timestamp Δt histogram (mean + p99), frame pairing mismatch rate
- Geometry evidence: epipolar error + rectified row residual over temperature/time
- Matching evidence: confidence/invalid maps, LR-fail rate, hole rate, cost margin stats
- Start from target distance band: near/mid/far priorities define the required disparity robustness.
- Work backward from allowable depth error: if far-range depth error is too large, increase effective sensitivity via baseline (B) and/or improve matching quality (Δd) within bandwidth/compute limits.
- Baseline increase has real costs: tighter mechanical tolerance, tougher calibration gates, higher drift sensitivity.
H2-8. Latency, Throughput & Determinism (Real-Time Behavior)
Real-time stereo is defined by determinism, not by average speed. The system must bound tail latency and jitter under worst-case load. A complete real-time story requires a decomposed latency model and end-to-end evidence fields (frame ID + timestamps + queue depth) for p99 tracking.
- Capture: exposure + readout; risks include exposure skew and rolling-related timing differences.
- Buffer: line/DDR/queue; tail latency often comes from queue depth variation and memory contention.
- Compute: rectify + disparity + refine; complexity grows with resolution, search range, and aggregation.
- Output: packaging/transfer/host consumption; scheduling effects can enlarge p99 even if average is fine.
- Resolution ↑ → bandwidth/compute ↑ → p99 latency worsens under contention.
- Search range ↑ → compute scales strongly → far coverage improves but FPS drops.
- Aggregation/refine ↑ → quality improves but deterministic latency becomes harder.
- Buffering ↑ → throughput smooths but end-to-end latency increases; re-order risk rises.
- p99 end-to-end latency (tail behavior dominates control stability)
- frame-to-frame jitter (output cadence stability)
- drop / mismatch rate (missing or mis-paired frames break downstream logic)
- Log fields: frame_id (and pair_id), tap timestamps (capture / post-rectify / post-disparity / output), queue depth, config snapshot.
- Load sweep: ramp load to saturation and record FPS, p99 latency, jitter, drop/mismatch curves.
- Worst-case mode: max search range + low-texture scene; observe tail latency inflation and invalid spikes.
H2-9. Scene & Illumination Pitfalls (Textureless, Flicker, Motion, Reflections)
Stereo matching relies on two practical assumptions: appearance consistency between left/right views and time consistency for paired frames. Real-world scenes frequently violate these assumptions, producing depth holes, edge tearing, and confident-but-wrong surfaces. This section maps scene types → depth symptoms → root causes and outlines mitigation principles without entering lighting-controller implementation details.
- Low texture: cost curves become flat → disparity becomes ambiguous → holes/speckle increase and confidence drops.
- Repetitive patterns: multiple local minima → wrong matches may look “stable” → striped wrong depth and periodic bias.
- Specular / reflections: left/right intensity differs with viewpoint → appearance mismatch → collapse or edge noise.
- Transparent objects: observed content is background/refraction mix → stereo assumptions fail → depth passes through.
- Motion + sync skew: frames are not truly simultaneous → moving edges mis-pair → depth tearing and unstable surfaces.
- Flicker: illumination changes within/among exposures → per-frame brightness mismatch → matching instability and jitter.
- Confidence / invalid masks (per-pixel) + invalid reason bins
- LR-check fail rate and hole rate (overall + distance bins)
- Timestamp Δt p99 and pairing mismatch counters (motion issues often start here)
- Cost margin stats (flat/ambiguous matches correlate with low texture & repetition)
- Fail honestly: use confidence/invalid outputs to prevent confident wrong depth from propagating.
- Strengthen consistency checks: LR-check + cost margin gating, especially for repetitive patterns.
- Stabilize illumination when needed: use synchronized strobe aligned to exposure windows (link to Vision Lighting Controller).
- Measure under motion: verify timestamp pairing and exposure alignment before tuning disparity knobs.
H2-10. Validation Test Plan (What to Measure, Acceptance, Regression)
A stereo module is “ready” only when its claims are measurable, repeatable, and regression-friendly. This plan defines what to measure, the acceptance form (without hard-coded numbers), and the minimum evidence fields needed to explain tail failures under temperature, motion, and illumination changes.
- Identity: frame_id + pair_id, configuration snapshot (range/window/sub-pixel/LR-check)
- Timestamps: staged taps (capture / post-rectify / post-disparity / output) and Δt statistics (mean + p99)
- Geometry: epipolar/row residual summaries (before/after thermal soak)
- Depth quality: RMSE/MAE by distance bins, hole/invalid rate + reason bins, edge noise indicators
- Real-time: p99 latency, jitter, drop/mismatch rate under load sweep
| Test group | What to measure | Tools / instrumentation | Acceptance form | Notes |
|---|---|---|---|---|
| Sync | timestamp Δt (mean/p99), frame counter mismatch rate | log counters, GPIO capture (scope/LA) | Δt tails bounded; mismatch rare and explainable | Run with motion to expose pairing failures |
| Calibration | reprojection error, epipolar error, rectified row residual | calibration dataset + offline analysis | alignment stable and repeatable across sessions | Reprojection alone is insufficient; gate epipolar/row residual |
| Thermal stability | extrinsic drift proxy via epipolar/row residual vs temperature | thermal chamber, temp logging | drift within allowed envelope after soak/cycle | Capture before/after soak using identical scene set |
| Depth quality | RMSE/MAE by distance bins (near/mid/far) | ground-truth fixture or reference target | errors bounded per distance bin (far bin most sensitive) | Report both bias and random components |
| Density / invalid | hole rate, invalid rate, invalid reason bins, LR-fail rate | depth logs + masks | invalid behavior is stable and predictable under stress | Prefer honest invalid over confident wrong depth |
| Edge behavior | edge noise, boundary bleeding indicators | scene targets with depth discontinuities | edges remain stable; noise does not explode at motion/lighting changes | Evaluate with both static and moving edges |
| Robustness | metric drift under vibration / illumination change / motion | vibration fixture, controlled light, motion rig | metric drift bounded; failures are explainable via evidence fields | Include low texture, stripes, specular, motion scenes |
| Real-time | p99 latency, jitter, drops/mismatch under load sweep | staged timestamps, load generator | tails bounded in worst-case config; no uncontrolled jitter/drops | Test max range + low texture as worst-case compute path |
- Scene set: include low texture, repetitive stripes, specular/reflection, motion, flicker-stress scenes.
- Config lock: record full disparity configuration snapshot for every run.
- Compare deltas: new vs reference build; report metric deltas and tail changes (p99).
- Explain tails: correlate p99 spikes with queue depth, invalid bins, and pairing counters.
H2-11. Field Debug Playbook (Symptom → Evidence → Isolate → Fix)
This playbook turns stereo failures into a repeatable SOP: start with two measurements, classify by evidence (Sync vs Geometry/Calibration vs Matching/Scene), then apply the first fix and re-check the same metrics. Power issues are handled only as reset/uptime evidence (no supply-topology discussion here).
- Pair identity: frame_id + pair_id + mismatch counters
- Timing: timestamp Δt (mean + p99) + staged taps (t_cap / t_rect / t_disp / t_out if available)
- Geometry: epipolar/row residual summaries (pre/post thermal soak)
- Depth quality: hole/invalid rate (+ reason bins), LR-fail rate, confidence distribution
- System health: reset_reason + uptime (to rule out brownout/reboot events without power-topology details)
- Config snapshot: search range, window size, sub-pixel, LR-check, filtering
| Symptom | First 2 checks | Discriminator | First fix | MPN examples (replaceable) |
|---|---|---|---|---|
| Depth jumps / flickers |
1) timestamp Δt p99 2) p99 latency / jitter |
If Δt tails inflate or pair mismatches spike → Sync/Pairing If latency p99 inflates under load but Δt stable → Queue/Determinism |
Tighten pairing window (frame_id + timestamp gate). Ensure a single trigger/FSIN source is used for both sensors. |
Jitter cleaner: TI LMK04828, ADI AD9528 Clock buffer: TI CDCLVC1102 Trigger buffer/level shift: SN74LVC1T45 |
| Holes suddenly increase |
1) invalid rate (by distance bins) 2) confidence + LR-fail rate |
If invalid rises mostly on low texture/specular scenes while geometry stable → Matching/Scene If invalid rises everywhere after temperature change → Geometry drift |
Increase “honest invalid” gating (confidence threshold). Enable/strengthen LR-check; constrain search range for stability. |
Stereo accel SoC/ISP class (examples): NVIDIA Jetson Orin NX, Renesas RZ/V2L FPGA for disparity pipeline (examples): Xilinx Zynq-7020, Intel Cyclone 10 GX DDR for buffering (examples): Micron MT53D1024M32D4 (LPDDR4) |
| Motion edges tear / split |
1) timestamp Δt histogram under motion 2) pair mismatch counters |
If mismatch spikes correlate with motion → Sync/Pairing If Δt stable but tearing persists → consider rolling/exposure alignment (still Sync-domain) |
Verify exposure alignment mode (same exposure start vs same frame boundary). Use deterministic trigger distribution; reduce Δt tail. |
Timing hub / sync capable switch (PTP-capable, if used as module timing source): Microchip LAN9662, LAN9696 MCU for trigger scheduler (examples): STM32H743, NXP i.MX RT1062 Oscillator (example): SiTime SiT5356 (low-jitter XO) |
| Far range fails (near OK) |
1) far-bin invalid rate / MAE 2) config snapshot (search range, sub-pixel) |
If far-only degrades and near stays stable → Error budget sensitivity (Δd → large ΔZ) If far degrades after thermal soak → Extrinsic drift |
Increase search range only if compute/bandwidth budget allows; otherwise increase invalid honesty. Re-calibrate and re-check epipolar/row residual after thermal soak. |
SerDes/bridge (for multi-meter cable, examples): Analog Devices ADN4604 (crosspoint), TI DS90UB954 (GMSL/FPD-Link class aggregator example) PCIe interface for grab/host link (example): Microchip PFX/PEX PCIe switch family (deployment-dependent) NVMe SSD buffering (example): Samsung PM9A1 (platform-dependent) |
| Epipolar residual increases after warm-up |
1) epipolar/row residual vs temperature 2) depth bias drift (plane fit) |
If geometry residual drifts with temperature → Calibration/Mechanics drift If geometry stable but depth shifts → Matching/filters/config |
Add thermal soak step to calibration acceptance; gate on epipolar stability. Re-run calibration; lock parameter versions used by rectification. |
Temp sensors (examples): TI TMP117, Maxim MAX31865 (RTD front-end if used) NVM for calibration params (examples): Winbond W25Q128JV (SPI NOR), Microchip 24AA02 (EEPROM) |
| Intermittent “all-zero” depth or blank output |
1) reset_reason + uptime 2) pipeline stage counters (frames in/out) |
If resets occur → System stability (handle only via reset evidence here) If no reset but frames stop at a stage → Pipeline stall (buffer/compute) |
Add watchdog and stage timeouts; log stall stage and queue depth. Bound queue growth; fail safe with invalid outputs instead of freezing. |
Watchdog supervisor (examples): TI TPS3431, Microchip MCP1316 eFuse / load switch (examples): TI TPS25947, ADI LTC4368 (deployment-dependent) |
H2-12. FAQs (Stereo Vision Module) — 12 Q&A
These FAQs capture long-tail failure modes without scope creep. Each answer follows: Conclusion → 2 evidence checks → first fix → map back to chapters.
Depth is stable when static, but breaks under motion — check sync or matching first?H2-3/H2-6/H2-11
Answer: Start with sync/pairing evidence. Motion amplifies even small frame-to-frame skew, producing tearing that looks like “algorithm failure”. Only after timing is clean should matching knobs be tuned.
- timestamp Δt p99 + pair mismatch counters during a motion scene (not only static targets).
- invalid/holes and LR-fail rates vs motion (do they spike only when objects move?).
- Tighten pairing gate (frame_id + timestamp window) and align exposure timing for both sensors.
- If timing is stable, increase “honest invalid” gating and LR-check robustness before expanding search range.
Left/right images look sharp, but depth is globally too near/far — baseline/extrinsics or disparity scale?H2-5/H2-7
Answer: A global depth bias usually points to geometry (baseline/extrinsics) or a systematic disparity scale/offset. Separate them by checking rectified alignment and distance-binned bias.
- epipolar/row residual on rectified pairs (alignment quality).
- plane-fit depth bias across near/mid/far bins (is the bias proportional with range?).
- Re-validate calibration artifacts (K/D/R/T and rectification maps) and confirm the correct baseline sign/units.
- If epipolar residual is low but bias persists, check for a consistent disparity offset and apply a calibrated correction.
Near range is accurate, far range is noisy — unavoidable error budget or can parameters save it?H2-7/H2-6
Answer: Far depth is inherently more sensitive: a small disparity error becomes a large depth error at long range. Parameters can improve stability and honesty, but cannot fully defeat physics.
- MAE/RMSE by distance bins (near/mid/far) plus confidence distribution.
- search range + sub-pixel settings captured in the config snapshot.
- Prefer “honest invalid” at far range (confidence gating) over unstable noisy depth.
- Only expand search range/sub-pixel if compute + bandwidth budgets remain deterministic (verify p99 latency).
Failure happens only under certain lights — flicker or low texture? How to prove it?H2-9/H2-10
Answer: Flicker causes time-varying brightness mismatch between left/right frames, while low texture produces ambiguous matching even with stable brightness. Evidence must separate time-driven vs texture-driven collapse.
- frame-to-frame brightness/mean intensity variation vs exposure time (look for periodic patterns).
- invalid reason bins (low-texture vs specular-like failures) and their correlation to the light environment.
- Lock exposure for both sensors and avoid exposure times that amplify mains flicker artifacts.
- If required, use synchronized illumination as a system requirement (lighting-driver implementation stays out of this page).
Timestamps look aligned, but depth still jitters — could timestamps mark arrival, not exposure? How to verify?H2-4/H2-11
Answer: Yes. Many systems timestamp “when a frame reaches a block” rather than the actual exposure moment. Verify timestamp meaning with an external event marker and compare measured event timing against timestamps.
- An event marker visible in both images (flash edge / strobe marker) and its pixel-time alignment.
- two timestamp taps (sensor-side vs post-buffer/host) to estimate fixed offsets and jitter sources.
- Move timestamp tap closer to exposure (or apply a calibrated offset per pipeline stage).
- Re-check timestamp Δt p99 after the offset is applied under motion.
Calibration is great at first, then drifts when hot — extrinsics thermal drift or lens distortion change?H2-5/H2-7/H2-10
Answer: Most “heat drift” starts as extrinsics/mechanics drift (baseline/pose changes), then lens effects may add secondary residuals. Separate them by tracking epipolar/row residual vs temperature, not only reprojection error.
- epipolar/row residual trend vs temperature after soak/cycle.
- depth bias drift on a stable planar target (pre/post soak comparison).
- Add a thermal-soak gate to calibration acceptance; re-calibrate and re-check residual stability.
- Stabilize mechanics and mounting; treat temperature as part of the validation matrix (no NVM/version system here).
Depth holes appear near image borders — occlusion or LR-consistency too strict?H2-6/H2-9
Answer: Border holes are often expected occlusion (one camera sees content the other does not). They become excessive when LR-check thresholds or confidence gating are too strict for that scene’s texture and noise.
- invalid reason bins near borders (occlusion-like vs low-texture-like patterns).
- LR-fail rate and how it changes when thresholds are slightly relaxed.
- Keep occlusion invalids (honest failure), but tune LR-check/gating to avoid over-rejecting valid pixels.
- Validate on a border-heavy target set (depth discontinuities) and re-check edge noise metrics.
Depth quality drops after increasing frame rate — search range reduced or bandwidth/buffer not enough?H2-8/H2-6
Answer: Both are common. Higher FPS pressures compute, memory bandwidth, and buffering, which can silently force parameter reductions or introduce tail latency. Separate “parameter change” from “system overload” with config + p99 evidence.
- config snapshot diff (search range/window/sub-pixel/LR-check) before vs after FPS increase.
- p99 latency/jitter and drop/mismatch counters under the new FPS.
- If parameters were reduced, restore the critical ones (range/sub-pixel) and re-balance resolution/filters.
- If p99 explodes, bound queues, add deterministic buffering limits, and validate under max-load conditions.
Exposure mismatch makes matching unstable — tune sync first or lock AE strategy first?H2-3/H2-2
Answer: Treat exposure consistency as a pairing requirement. Start by ensuring both sensors share the same trigger timing, then lock exposure/gain decisions to prevent left/right appearance drift. AE algorithm details belong to the ISP page, not here.
- left vs right intensity histograms (mean + percentile spread) over time.
- timestamp Δt under the same scene (confirm timing is not the hidden cause).
- Lock exposure/gain pairs (or enforce coupled exposure settings) and keep them consistent across both sensors.
- Re-check confidence/invalid stability on the same scene set after locking.
How to choose baseline — is bigger always better, and what are the downsides?H2-7
Answer: A larger baseline improves far-range sensitivity, but increases occlusion, raises mechanical/calibration sensitivity, and can worsen near-range usability. Baseline must be chosen from the target distance envelope and acceptable invalid/occlusion behavior.
- Target distance bins (near/mid/far) and the required depth error envelope per bin.
- Occlusion/invalid behavior on edge/discontinuity targets (how many “honest invalids” are acceptable).
- Define a deliverable distance envelope first, then select baseline to match that envelope.
- Validate using distance-binned RMSE/invalid metrics and update the error budget assumptions.
What makes a “deliverable” stereo module — which logs/metrics are mandatory for field traceability?H2-10/H2-11
Answer: A deliverable stereo module must be traceable: every depth output should be explainable by timing, geometry, matching confidence, and determinism evidence. Without these logs, field failures become un-debuggable “it depends” incidents.
- frame_id + pair_id + mismatch counters
- timestamp taps (at least capture + output) and Δt stats (mean + p99)
- epipolar/row residual summaries
- confidence + invalid masks with reason bins; hole rate
- p99 latency/jitter; drop counters
- reset_reason + uptime; config snapshot
- Add the missing evidence fields first; then re-run the validation matrix to build a regression baseline.
After changing cable/interface, issues appear — blame interface timing or sync signal integrity first?H2-3/H2-8
Answer: Start from determinism and pairing evidence before blaming protocols. Cable/interface changes often add buffering variability, clock/jitter changes, or trigger integrity problems that surface as tail latency and pairing mismatches.
- timestamp Δt p99 + mismatch counters before vs after the interface change.
- p99 latency/jitter and stage counters (where frames start queueing or stalling).
- Bound buffering and enforce deterministic pairing gates; verify trigger distribution integrity end-to-end.
- Re-check tails (p99) under max-load; prefer honest invalid outputs over unstable depth.