Robot Vacuum Hardware: Motor Drivers, SLAM Sensors, Charging & BMS
← Back to: Consumer Electronics
This Robot Vacuum guide turns real-world failures into hardware evidence: what to measure first (rails, motor currents, sensor counters, charger flags) to pinpoint the root-cause domain fast. It focuses on power-path, motor drives, SLAM sensor hardware, docking/charging, and EMC/ESD resilience—so issues like random reboots, weak traction, drift, and intermittent charging become repeatable, fixable diagnostics.
Scope, boundaries, and what counts as success
Hardware domains covered
- Actuation: wheel drive (BLDC/PMSM), suction fan, brush/side-brush motors, current sensing and fault handling.
- Perception: ToF/LiDAR, IMU, camera (MIPI-CSI), cliff sensors, bumper switches; power integrity and timing hygiene.
- Compute: MCU/SoC split, memory/storage, reset/watchdog and brownout resilience.
- Energy: battery pack + BMS/fuel gauge, dock contacts, charger behavior, safety interlocks.
- Reliability/EMC: ESD entry points, EMI coupling from motors and DC/DC, dusty/wet field constraints.
For charger topology deep dives, reference the dedicated page: Fast Charger / Adapter.
Measurable success metrics
“Success” is defined by measurable outcomes and repeatable evidence, not by subjective “it feels better” checks. The following metrics anchor validation and field debug:
Evidence bundle expected for fast diagnosis
- Power: VBAT droop during fan ramp + wheel acceleration; reset reason; charger/BMS fault flags.
- Motion: phase current snapshot + fault pins; encoder ticks vs command; stall counters.
- Sensors: ToF confidence/status, IMU saturation flags, camera frame-drop/error counters, cliff sensor raw + ambient.
- Thermal: NTC profile (pack + driver hotspot proxy), throttling/derating flags, sustained load temperature rise.
Reference architecture: functional blocks + noisy/quiet partitioning
Why the partition model matters
Robot vacuums combine high dI/dt motor switching, sensitive perception sensors, and bursty edge compute on a shared battery. Most “mysterious” field failures are explainable once power loops, return paths, and sensor timing are mapped onto a single reference architecture.
Partition rules (practical, layout-driven)
- Keep motor power loops local: driver → MOSFETs → phase path → shunt return should not share the same return reference as IMU/ToF/camera.
- Define a “quiet sensor island”: sensors + local regulators + reference ground; connect to system ground at a controlled point.
- Place DC/DC by noise class: motor/5V buck(s) near energy entry and drivers; low-noise LDOs near IMU/ToF/camera rails.
- Protect exposed entry points: dock contacts, buttons, bumper metal, and external seams must route ESD away from sensor references.
- Make brownout behavior deterministic: supervisor + reset sequencing should prevent partial-rail states that corrupt sensors or storage.
Interface map (what connects to what)
| Subsystem | Typical Interfaces | Hardware Sensitivities (field symptoms) | First Evidence to Capture |
|---|---|---|---|
| Wheel drive | PWM / SPI / fault pins / encoder inputs | stall on carpet, asymmetric torque, driver trips, hot MOSFETs | phase current snapshot, VBAT droop, fault flags, encoder vs command |
| ToF / LiDAR | I²C / SPI / INT | confidence collapse, intermittent sensor missing, drift near motor load | status/confidence, supply ripple, frame/CRC counters |
| IMU | I²C / SPI / INT | heading drift, saturation during vibration, temp-dependent bias | sat flags, sample rate, timestamp jitter, temperature trace |
| Camera | MIPI-CSI + I²C control | frame drops, rolling artifacts tied to PWM/fan, boot-time init failures | frame error counters, rail stability, MIPI error status (if exposed) |
| Dock / charging | contact detect / charger status / NTC | intermittent charge, premature termination, heating at contacts | dock V/I curve, contact resistance trend, charger fault/status bits |
Design freeze checklist (expensive-to-change items)
- Current sensing topology and headroom: shunt/inline placement, gain/ADC range, and OCP blanking behavior.
- Noise zoning and return paths: motor loop vs sensor island; placement of star point and decoupling strategy.
- Sensor mechanical coupling: IMU mounting stiffness and vibration path; optical window contamination tolerance for ToF/LiDAR.
- MIPI power integrity: camera rails and clock stability under fan/wheel load transients.
- Dock contact protection: ESD path, reverse protection, inrush control, and wet/dust contact resilience.
- Brownout determinism: reset sequencing and “last-gasp” logging so failures are diagnosable.
- Fault observability: expose driver/charger/BMS status bits and counters; ensure logs survive sudden power loss.
Battery pack, BMS, fuel gauge, and the power tree
Field symptoms such as random resets, won’t-charge events, and short runtime are frequently triggered by the power-path and pack protection chain rather than the application logic. A robot vacuum merges bursty motor load steps with sensitive sensors and edge compute on a shared battery—making transient behavior and protection thresholds decisive.
Pack fundamentals (1S/2S/3S concepts) and protection reality
- Cell stack choice drives headroom: higher series count reduces current for the same power but increases protection complexity and balancing needs.
- Protection FETs define “sudden shutoff” behavior: OCP/short protection can disconnect the pack within microseconds–milliseconds, often appearing as an “unexplained reboot.”
- Balancing (multi-cell) is a reliability lever: cell mismatch can cause early UV trips under motor spikes even when average SOC looks healthy.
- NTC placement is not cosmetic: poor thermal coupling (too close to a hot spot or too far from cells) can cause false OTP, low-temp charge refusal, or aggressive derating.
Fuel gauge selection axes (what matters in a vacuum platform)
- Impedance tracking: better at dynamic loads and temperature swings; helps interpret sagging VBAT and aging cells.
- Coulomb counting: simpler but depends on stable calibration points; drift is amplified by temperature and load pulses.
- Learning requirements: a gauge that needs periodic full cycles can report “optimistic SOC” if the usage pattern never reaches learning endpoints.
- Low-temperature behavior: internal resistance rises; a platform may hit UV/protection earlier even with “reasonable” SOC. Gauge algorithms and NTC placement both matter.
Power-tree sequencing: compute vs sensors vs motors
A robust power tree separates noise classes and makes brownout behavior deterministic. Compute rails and storage are sensitive to partial-rail states; sensor rails need low noise; motor rails need surge capability without polluting the quiet domain.
| Rail Group | Primary Loads | Failure Signature | Design Focus |
|---|---|---|---|
| Compute rails | SoC/MCU, DDR, eMMC | random reboot, filesystem corruption risk, watchdog storms | supervisor/reset sequencing, brownout thresholds, last-gasp logging |
| Sensor rails | IMU, ToF/LiDAR, camera | sensor dropouts, drift/jitter under motor load, frame errors | quiet regulation, decoupling near sensors, clean ground reference |
| Motor rails | wheel drivers, fan, brushes | OCP trips, torque loss, heat spikes, EMI coupling | surge handling, local loops, current sense integrity, EMI containment |
Evidence to capture (fastest path to root cause)
- Correlate time: align VBAT droop, protection flags, and reset reason to a single event timeline.
- Separate “energy” vs “logic”: if resets align with sag/flags, prioritize power-path and protection before software hypotheses.
- Validate temperature gating: charge refusal or early termination often matches NTC thresholds more than charger “fault.”
Wheel drive motors: BLDC/PMSM driver, sensing, and stall detection
Wheel traction and torque delivery are the primary field reliability axis. Carpet friction, thresholds, hair ingestion, and wheel slip create rapid load changes that stress drivers, current sensing, and protection tuning. A resilient design distinguishes “high load” from “fault” using evidence that combines current, speed, and driver state.
Driver choices: integrated driver vs gate driver + MOSFETs
- Integrated drivers: compact and simpler bring-up; limited thermal headroom and fewer layout degrees of freedom.
- Gate driver + MOSFETs: scalable efficiency and current; requires careful loop control, gate ringing management, and EMI containment.
- Headroom matters: margin for peak phase current and VBAT dips prevents false trips and weak torque under real floors.
Current sensing topologies (noise vs accuracy vs protection speed)
| Topology | Strength | Risk | Typical Failure Symptom |
|---|---|---|---|
| Low-side shunt | simple, cost-effective, fast protection | ground bounce and coupling into quiet sensors | sensor jitter during wheel torque steps; unstable OCP tuning |
| Inline / high-side | cleaner ground reference, better domain isolation | higher cost, layout sensitivity, common-mode constraints | offset/scale errors leading to torque mismatch or delayed trip |
| Phase sensing | direct phase insight for advanced control | measurement complexity and noise exposure | erratic stall detection if sampling/timing is not deterministic |
Stall detection (evidence-based, algorithm-light)
Robust stall detection avoids single-signal decisions. It combines multiple evidence sources so carpet load is not misclassified as a fault and real jams are not missed.
- Current evidence: sustained overcurrent or repeated spikes beyond a tuned window.
- Speed evidence: encoder/estimated speed deviates from command beyond a threshold (torque not producing motion).
- Driver state evidence: back-EMF/commutation flags, phase open/short indicators, or fault-pin assertions.
Protection pitfalls: shoot-through, deadtime, and OCP blanking
- Shoot-through prevention: inadequate deadtime or gate ringing can cause sudden heating and hard failures.
- Deadtime tradeoff: too short risks cross-conduction; too long reduces effective torque and increases ripple.
- OCP blanking pitfalls: too small causes false trips during start/threshold events; too large delays real short protection.
- Make faults observable: fault pins and status registers should be logged with timestamps for field correlation.
Evidence to capture (what proves the root cause)
- Differentiate load vs fault: high current with matching speed is often “hard floor load”; high current with speed collapse indicates stall/jam.
- Correlate with power: VBAT droop that aligns with stall attempts can trigger system resets—link this to H2-3 evidence.
- Use driver truth: fault pins and commutation flags reduce guesswork and prevent blind parameter changes.
Suction fan + brush/side-brush motors: high RPM, acoustics, and EMI reality
Audible whine, abnormal noise, overheating, and unexpected shutoffs frequently originate from the fan and brush motor domain. These actuators combine fast load transients with long leads and high dV/dt switching—making acoustics, thermal rise, and EMI coupling tightly linked.
High-RPM suction fan motor (often BLDC): where the noise comes from
- Commutation ripple at high RPM: torque ripple and switching edges can excite structural resonances and create a stable tonal peak.
- PWM frequency vs audible band: if PWM or its strong harmonics land inside the audible range, “whine” becomes load-dependent and repeatable.
- Bearing and aero load: bearing wear, dust, and airflow restrictions increase load, shifting both current draw and the dominant acoustic peak.
Brush and side-brush motors: hair ingestion → stall profile → protection tuning
- Progressive drag vs sudden jam: hair wraps often create a rising-torque ramp before full stall; this demands different thresholds than an abrupt obstacle impact.
- Protection tuning tradeoff: aggressive OCP/timeout stops nuisance heating but can misclassify heavy carpet load as a fault.
- Observable recovery: an effective strategy logs stall attempts and recovery success/failure, so field data distinguishes “load” from “fault.”
EMI coupling: motor leads act like antennas (real-world constraints)
Motor phase/lead wiring forms a large loop area and couples energy into sensor interfaces and rails. The goal is not textbook perfection; it is predictable behavior across floors, dust states, and battery voltage.
| Coupling Path | Common Trigger | Hardware Lever (Practical) | Typical Symptom |
|---|---|---|---|
| Conducted (rail ripple) | fan ramp / brush stall | local LC/π filtering, rail partitioning, short return loops | sensor dropouts, resets, confidence dips |
| Radiated (lead antenna) | long leads, large loop area | twist/shorten leads, shield where feasible, routing separation | MIPI/ToF errors under motor load |
| Switching edge (dv/dt) | fast gate edges, ringing | snubbers, gate shaping, tight power stage layout | random intermittent errors, audible tone shifts |
Evidence to capture (make the problem measurable)
- Thermal slope: temperature rise rate separates friction/ingestion from normal load increase.
- Acoustic fingerprint: track the dominant peak as duty/RPM changes; stable tones are diagnostic.
- Correlation beats intuition: if sensor errors spike with fan PWM transitions, prioritize edge/EMI controls before firmware hypotheses.
SLAM sensor stack hardware: ToF/LiDAR, IMU, camera, cliff & bumper sensors
This section stays hardware-specific: interfaces, noise susceptibility, calibration hooks, and failure signatures. The fastest debug path is to read sensor status and integrity counters, then correlate those signatures with power noise and EMI conditions.
ToF / LiDAR: supply noise sensitivity and optical reality
- Supply noise sensitivity: ripple on emitter/receiver rails can reduce confidence and increase frame errors during motor transients.
- Optical window contamination: dust and smears shift return strength and confidence without changing digital link health.
- Emitter aging: reduced optical power compresses usable range and increases low-confidence events under the same floor and lighting.
- Frame integrity flags: CRC/error counters and validity flags often reveal whether the issue is link/power or optics.
IMU: placement, vibration coupling, and temperature drift
- Placement matters: mounting near fan or wheel vibration paths increases noise and can trigger saturation events during impacts.
- Vibration coupling: mechanical resonances can look like “drift” unless saturation/overflow flags are checked.
- Temperature drift: bias shifts with temperature; calibration hooks should include temperature-aware offsets and logged health states.
- “Bad IMU” signatures: saturation flags, abnormal variance bursts, or repeated re-initialization events in logs.
Camera: MIPI integrity and flicker/rolling artifacts (hardware causes only)
- MIPI-CSI integrity: lane/frame errors or link resets often correlate with EMI bursts and ground/reference disturbances.
- Exposure flicker coupling: LED PWM or motor PWM can couple into rails and manifest as periodic exposure instability.
- Rolling artifacts: supply ripple, reference noise, or timing disturbances can create banding-like effects independent of SLAM processing.
Cliff and bumper sensors: false positives and hardware-level mitigation
- IR cross-talk: emitter/receiver geometry, shielding, and aperture control determine how much self-reflection leaks into measurements.
- Floor reflectivity sensitivity: gloss vs carpet changes return strength; hardware should preserve margin via clean rails and stable mechanical height.
- Ambient immunity: supply isolation and mechanical light blocking reduce false triggers before any algorithmic filtering.
Evidence to capture (turn “sensor weirdness” into measurable signatures)
- CRC up, confidence stable: suspect link integrity or EMI more than optics.
- Confidence down, CRC clean: suspect optical window, emitter aging, or mechanical occlusion before interface tuning.
- Saturation flags during impacts: prioritize mounting/vibration paths and rail cleanliness before calibration changes.
Sensor timing, synchronization, and calibration in production
“Stable on bench but drifting in the field” is frequently explained by timing hygiene and calibration lifecycle control. The objective is deterministic timestamps, bounded latency, and calibration assets that are stored, versioned, and locked for traceability.
Timestamp sources and drift contributors (what breaks determinism)
| Timestamp Source | Strength | Main Drift / Error Contributor | Typical Failure Signature |
|---|---|---|---|
| SoC timer (host-side) | single global time base | interrupt latency, scheduling jitter, bus contention | timestamp jitter grows with motor load / logging |
| Sensor internal time | low local jitter for frames | clock drift vs host, reset/rollover handling | slow drift or step changes after power events |
| GPIO / interrupt edge | good for alignment markers | edge integrity, debounce, EMI spikes | sporadic outliers and false edges under EMI |
Calibration assets (what must be stored, versioned, and traceable)
- IMU temperature calibration: bias/scale compensation requires a temperature reference and a stable version tag.
- ToF offset and quality baselines: offsets and confidence baselines must be associated with the sensor module identity.
- Camera intrinsics (storage only): intrinsic parameters should be persisted as immutable assets for that module build.
- Asset packaging: store calibration blocks with CRC, schema ID, and version to prevent silent mismatches.
End-of-line (EOL) flow: calibrate, verify, then lock
- Calibrate: generate IMU temp points, ToF offset checks, and camera intrinsic bundle (if applicable).
- Verify: re-read sensors, check frame integrity counters, and validate that assets load and apply without errors.
- Lock: seal calibration blocks with CRC/version; record a build fingerprint so field logs can detect mismatches.
Evidence to capture (production + field debug signals)
- Jitter distribution: track p50/p95/p99 of timestamp jitter; rising tails indicate contention or EMI-driven ISR disruption.
- Integrity counters: frame CRC and drop counters separate “data quality loss” from “calibration drift.”
- Version mismatch: explicit mismatch logs prevent silent drift caused by wrong calibration blobs.
Edge compute: SoC/MCU split, memory, thermals, and brownout resilience
Compute instability typically presents as reboot, freeze, or navigation jitter. The foundation is a clean partition between real-time control and higher-level compute, plus storage/logging and thermal/brownout behavior that remains predictable under motor transients.
Partitioning: real-time motor MCU vs application/AI SoC
- MCU domain: deterministic motor control, safety interlocks, and time-critical IO where bounded latency is required.
- SoC domain: sensor processing and higher-level compute that benefits from larger memory and throughput.
- Isolation benefit: if the SoC stalls or throttles, the MCU can keep actuators in a safe state and preserve minimal telemetry.
Memory & storage: DDR integrity basics and power-loss-safe logging
- DDR behavior: training and marginal signal integrity can fail at temperature corners, showing up as intermittent crashes or corrupted buffers.
- ECC vs non-ECC: if ECC is not available, error detection via watchdog resets and integrity counters becomes more critical.
- eMMC wear: sustained logging can accelerate wear; reserve budgeted write patterns and store a compact “event ring.”
- Power-loss-safe strategy: use short atomic records and a journaled layout so a brownout does not destroy the last useful evidence.
Thermal: hotspot mapping and load-aware derating
- Hotspot mapping: identify SoC, PMIC, and memory hotspots; temperature gradients matter more than one average reading.
- Derating policy: tie throttling to sensor and motor load so performance reduction is controlled rather than chaotic.
- Telemetry hooks: expose thermal throttle flags so “jitter” can be correlated to thermal events.
Brownout hardening: reset sequencing, supervisors, watchdogs, last-gasp logs
- Reset sequencing: rail order and reset release timing should prevent partial-rail states that corrupt memory or sensors.
- Supervisor + watchdog: use both: supervisor for electrical truth, watchdog for software liveness.
- Last-gasp logging: capture a minimal record (reset reason, rail snapshot, key counters) before rails collapse.
Evidence to capture (turn crashes into classified events)
- Electrical vs software: supervisor brownout flags separate rail collapse from firmware deadlock.
- Thermal correlation: throttle flags aligned with jitter/freeze prevent chasing phantom “navigation bugs.”
- Storage survivability: confirm the last-gasp record remains readable after repeated power events.
Docking & charging subsystem: contacts, inrush, charger IC, and safety
Most “won’t dock / docks but won’t charge / intermittent charge / overheating” complaints can be diagnosed by separating four layers: contact physics, hot-plug/inrush behavior, charger state gates, and safety interlocks. The goal is to turn each symptom into a measurable signature using voltage/current traces, contact resistance trends, and charger status bits.
Dock power chain: where energy is allowed to flow
- Dock contacts establish the supply path, but electrical continuity can be intermittent even when mechanical docking looks fine.
- Reverse protection prevents backfeed and wrong-polarity events; many “connected but no charge” cases are a protection gate not releasing.
- Soft-start / inrush control avoids collapsing the dock supply when the robot hot-plugs a large input capacitance.
- Charger IC applies temperature qualification, safety timers, and termination logic before it commits to fast charging.
Contact physics: pogo-pin wear, oxidation, and misalignment
- Wear / spring fatigue: reduces normal force; micro-open events appear as brief current dropouts and local heating.
- Oxidation / contamination: raises contact resistance (Rc), often causing “charges only if pressed” or “heats at the dock.”
- Misalignment: partial contact can pass “voltage present” checks but fails at higher charge current.
- Measurement method: focus on Rc trend over repeated dock cycles; trending Rc is more diagnostic than a single point.
Inrush & hot-plug: why the first 200 ms matters
- Soft-start target: limit current surge so dock voltage does not collapse during plug-in, especially when the robot’s system rail capacitance is large.
- Reverse protection realism: ideal diode / FET reverse blocks can add drop and gating behavior; validate their turn-on conditions and fault recovery.
- Transient suppression: the dock interface is an ESD/transient entry path; suppression must clamp with a short return path to be effective.
- Common pitfall: an overly slow ramp can trigger charger timeouts or detection windows; an overly fast ramp can trigger brownouts and false faults.
Charger behavior (conceptual gates): phases, NTC, termination, safety timers
| Gate / Phase | What It Protects | Common Misread Symptom | Useful Evidence |
|---|---|---|---|
| NTC qualification | over/under-temp charging safety | docked but never enters fast charge | NTC profile + charger temp status bit |
| Pre-charge / ramp | limits stress on depleted pack | slow charge that looks “broken” | dock current trace + phase indicator |
| Fast charge | primary energy transfer | intermittent charge under load changes | Idock stability + fault counter increments |
| Termination | prevents overcharge and heating | “full” too early, short runtime | termination reason + pack voltage trend |
| Safety timer | stops abnormal long sessions | stops at a fixed time, then retries | timer status flag + repeat pattern |
Safety interlocks: wet/dust signals and pack fault handling
- Wet/dust detection (if present): treat as a hard gate; log an explicit reason code so “won’t charge” is not ambiguous.
- Pack fault path: charger must honor BMS protection trips and recover cleanly without repeated hot-plug stress cycles.
- Thermal interlock: docking heat can be dominated by contact resistance rather than pack temperature—track both NTC and contact heating signature.
Evidence bundle: a minimal set that closes the loop
- Dock voltage/current: capture the first 200 ms (hot-plug transient) and the steady phase (charge stability).
- Rc trend: log contact resistance proxies (voltage drop at known current) across docking cycles.
- Status bits: record at least one snapshot at “docked, not charging,” and one during “charging steady.”
EMC/ESD/reliability in a dusty, high-dI/dt moving platform
A robot vacuum can pass lab checks but still fail intermittently at home because the field environment combines more entry points (dock, buttons, bumper, exposed metal), stronger noise sources (motor PWM edges, DC/DC switch nodes, long wiring), and constantly changing coupling conditions (movement, dust, humidity, floor reflectivity, dock placement).
ESD entry points: where charge is injected in real homes
- Dock contacts: exposed interface, frequent touch, and hot-plug makes it an ESD/transient gateway.
- Buttons & bumper: high-touch surfaces; a single hit can cause resets, bus lockups, or sensor hangs.
- Exposed metal / trims: offers a discharge point with short path into ground reference disturbances.
EMI sources: who creates the disturbance
- Motor PWM edges: fast current steps and large loops radiate and inject noise back onto rails.
- DC/DC switch nodes: high dV/dt switching nodes couple into nearby sensitive traces and reference planes.
- Long sensor cables: act as antennas; routing and return paths determine whether they inject or receive noise.
Mitigations that survive reality: zoning, return paths, TVS placement, ferrites
- Layout zoning: keep a clear Noisy Zone (motors/DC-DC) separated from the Quiet Sensor Island.
- Return paths: short, continuous return loops often outperform adding components in the wrong location.
- TVS placement logic: clamp at the entry point with a short return path; a TVS far from the connector often fails to protect the victim.
- Ferrites (where they help): most useful on long cables or specific sensitive branches; treat them as targeted impedance, not a universal cure.
Evidence to capture: turn “random” into classified events
Fail reproduction matrix (minimal but effective)
| Motor state | Docking state | Sensor active | What to record |
|---|---|---|---|
| Off / idle | Undocked | ToF + IMU | baseline counters, noise floor, reset reason (should be none) |
| Fan only | Undocked | All sensors | frame drops, confidence dips, rail ripple correlation |
| Wheel drive | Undocked | Camera on | MIPI errors/resets, exposure flicker signature, counter spikes |
| Brush stall | Undocked | All sensors | worst-case EMI, bus lockups, watchdog events |
| Any | Docking contact | All sensors | ESD entry coupling via contacts, charge-status flips, resets |
| Any | Charging steady | All sensors | conducted noise via dock, charger status bits, counter stability |
Validation & production test plan (bring-up → EOL → reliability)
This section converts the hardware architecture into an executable plan: a bring-up checklist, a stress/reliability suite that mimics home conditions, and an EOL gate that locks calibration assets and measurable accept windows.
Bring-up checklist (power → motors → sensors → thermal sanity)
1) Rails & sequencing
- Verify VBAT path and all critical rails (SoC, MCU, sensors, motor driver supply) at idle and during motor start.
- Capture: VBAT droop + compute rail dip during worst inrush (fan spin-up + wheel start).
- Typical debug enablers (MPN examples): TI TPS3890 (supervisor), TI TPS22965 (load switch), TI TPS25947 (eFuse / inrush control).
2) Motor no-load baselines
- Record no-load current signatures for wheel motors, fan, brush, and side brush.
- Capture: current waveform, fault pins/flags, tach/encoder consistency.
- Common sensing parts (MPN examples): TI INA240 (PWM current-sense amp), TI INA226 (power monitor), Vishay WSL series shunt resistors.
3) Sensor enumeration & link integrity
- Enumerate I²C/SPI sensors; validate MIPI CSI bring-up with stable frame counters.
- Capture: sensor status registers, CRC/error counters, frame drop counts.
- Protection/robustness parts (MPN examples): Littelfuse SP0502 (ESD array for low-speed lines), Semtech RClamp families (high-speed ESD), Murata BLM21 ferrite beads.
4) Thermal sanity
- Confirm airflow and hot-spot locations under controlled loads (fan only / wheel drive / brush stall).
- Capture: temperature rise curve and any throttling flags.
- Common temperature parts (MPN examples): Murata NCP series NTC thermistors, Vishay NTCLE series NTCs.
Stress & reliability tests (home conditions, made repeatable)
| Test | Setup (repeatable) | Evidence to capture | Typical failure signatures | Hardware knobs & MPN examples |
|---|---|---|---|---|
| Carpet stall | Defined carpet type + fixed obstacle height + fixed duration | Wheel phase current, VBAT sag, driver faults, stall recovery counters | OCP trips too early, brownouts, thermal derating |
OCP/timing tuning + sensing:
|
| Hair ingestion simulation | Fixed hair mass + brush RPM profile + time window | Brush current signature drift, stall counts, temperature rise | Repeated stall-retry loop, brush driver overheating |
Thermal + protection:
|
| Low-temp charge | Two temperature points (e.g., ~0–10°C and ~10–20°C), fixed dock supply | Vdock/Idock trace, charger status bits, NTC profile, pack flags | Never enters fast charge, timer aborts, intermittent charge |
Charger/gauge options:
|
| Dock cycles | 500–1000 dock/undock cycles with alignment tolerance sweep | Docking success rate trend, Rc proxy trend, local heating at contacts | Charge starts less often, contact heating increases |
Contact + protection:
|
| ESD hit points | Hit map: dock contacts / buttons / bumper / exposed metal | Reset reason, bus lock flags, sensor frame counters, reproduction matrix cell ID | Soft lockups, I²C stuck, frame drops, watchdog resets |
Layout + clamps:
|
EOL (end-of-line) tests: calibrations, signatures, and gates
- Sensor calibration assets: store offset/version/CRC for IMU temperature calibration and ToF offset; lock these to a build ID.
- Motor current signature gate: check no-load current windows for wheel/fan/brush; deviations often indicate friction, bearing load, or harness resistance.
- Dock contact resistance gate: compute Rc proxy from known current and contact drop; trend it across multiple dock engages.
- Pack health snapshot: record pack voltage, temperature, protection flags, and gauge SoH estimate (if available).
Pass/fail thresholds (measurable accept windows)
| Metric | Suggested accept window (practical) | Fail signature | Primary evidence |
|---|---|---|---|
| VBAT sag at worst motor surge | Peak droop typically < ~8–12% of nominal VBAT (platform-dependent) | Resets / UVLO flags / repeatable brownouts | VBAT trace + reset reason |
| Compute rail dip | Momentary dip typically < ~3–5% under planned surges | DDR/SoC instability, watchdog bites | SoC rail trace + watchdog logs |
| Dock inrush transient | No prolonged Vdock collapse; charge state enters stable phase without repeated retries | Charge start/stop oscillation, timer aborts | Vdock/Idock + charger status bits |
| Sensor integrity | Frame drops and CRC/error counters remain near-baseline under normal loads | Counter spikes correlated with motor PWM states | Frame counters + motor state snapshot |
| Dock contact Rc proxy | Stable trend across cycles; no rapid upward drift | Heating at contacts, intermittent charge | Rc proxy trend + thermal spot |
BOM/MPN starter list (common choices seen in robot vacuum subsystems)
MPNs are examples for orientation and debugging discussions; selection must match pack voltage, currents, thermal limits, and EMC constraints.
Field debug playbook: symptom → first 3 checks → evidence bundle
This playbook is designed for the first hour of troubleshooting: classify the symptom into the most likely hardware domains, run the first three measurements, and collect a minimal evidence bundle that supports a root-cause decision.
Minimum evidence bundle (reference IDs used in the tables)
| Evidence ID | What to capture | Typical tools / hooks | Relevant parts (MPN examples) |
|---|---|---|---|
| E1 | VBAT + critical rails during motor surge | Scope trigger on VBAT dip; supervisor reset flags | Supervisor: TI TPS3890; eFuse/inrush: TI TPS25947 |
| E2 | Vdock/Idock transient + steady charge segment | Dock probe points + charger status snapshot | Chargers: TI BQ24074/BQ25895/BQ25798 |
| E3 | Motor current waveforms + fault pins/flags | Current shunt + current-sense amp + driver faults | Sense amp: TI INA240; shunt: Vishay WSL |
| E4 | Sensor integrity counters (drops/CRC/confidence) | Sensor status registers + frame counters | ToF: ST VL53L1X/VL53L5CX; IMU: Bosch BMI270, TDK ICM-42688-P |
| E5 | Reset reason + watchdog + thermal throttling flags | System logs + watchdog counters | Supervisor: TI TPS3890 (reset capture); watchdog depends on MCU/SoC |
| E6 | Dock contact Rc proxy + docking success trend | Known current + contact drop + temperature spot | Pogo pins: Mill-Max spring pins; TVS: Littelfuse SMBJ series |
| E7 | ESD hit-point map + reproduction matrix cell ID | Record motor/dock/sensor states per event | ESD arrays: Littelfuse SP0502, Nexperia PESD5V; ferrites: Murata BLM21 |
Symptom-to-action table (repeatable format)
| Symptom | Most likely domains | First 3 measurements | What logs to pull | Likely root causes | Fix knobs & MPN examples |
|---|---|---|---|---|---|
| Random reboot under load | Power tree • inrush • thermal • motor surge |
|
Reset reason flags • watchdog counters • thermal throttling flags | Brownout from surge • sequencing margin too tight • protection trip and recovery oscillation |
Soft-start/inrush + reset hygiene:
|
| Won’t charge / intermittent charge | Dock contacts Rc • inrush • charger gates • NTC |
|
Charger status/phase • NTC qualification • pack flags • charge abort reason | Oxidation/misalignment • Vdock collapse on hot-plug • NTC out-of-window • safety timer abort |
Charge path + contact robustness:
|
| Can’t climb threshold / weak traction | Wheel drive • current limit • harness drop • thermal derating |
|
Driver fault bits • stall count • temperature / derating flags | OCP window too tight • phase current limit mis-tuned • mechanical friction shifts current signature |
Stall survivability:
|
| Navigation drift / spins in place | IMU integrity • sensor timing • power noise into sensors |
|
Sensor status regs • frame counters • timestamp jitter proxy | Vibration coupling saturates IMU • sensor rail noise collapses confidence • intermittent link errors |
Sensor robustness:
|
| False cliff detection / refuses dark room | Cliff IR sensors • optical contamination • supply noise |
|
Cliff sensor status • false-positive counter • motor state snapshot | IR cross-talk • window contamination • margin too small under noise |
Typical parts:
|
| Motor overheat / noisy fan | Fan BLDC drive • PWM/acoustics • airflow load • EMI coupling |
|
Thermal throttling • fan fault bits • restart counters | Bearing/duct load increases current • PWM in audible band • protection retry loop |
Driver + sensing:
|
| Sensor dropout (ToF/IMU/camera) | I²C/SPI integrity • MIPI stability • ESD/EMI coupling • rail glitches |
|
Sensor status regs • bus error flags • reset reason (if soft reset occurs) | Conducted noise via rails • radiated coupling via harness • ESD-induced soft hang |
Protection + signal hygiene:
|
| Docking failure rate increases over time | Contact wear/oxidation • alignment tolerance drift • heating |
|
Dock attempt counts • charge start failure reasons • Rc trend log | Spring fatigue • oxidation/contamination • mechanical alignment drift |
Contact ecosystem:
|
FAQs (hardware evidence based, mapped back to H2s)
Each answer is intentionally “first-hour practical”: the first 2–3 measurements, the minimum logs/status bits, and how to split the most common A/B root-cause paths without drifting into app/cloud or SLAM math.
- Reboot only when the fan ramps up
- Charging stops after 1–3 minutes
- Hard floor OK, stalls on carpet
- Navigation drifts after 10–20 minutes
- Cliff sensors false-trigger on dark/reflective floors
- One wheel motor runs hotter than the other
- Camera occasionally drops frames
- Docking success rate falls over weeks
- ESD causes “sensor not found” until reboot
- Battery runtime decays quickly
1) Why does it reboot only when the fan ramps up? What two waveforms prove the cause? (→H2-3/H2-8)
The fastest proof is a time-aligned capture of VBAT (or PACK+) and the 5V/SoC rail during the fan ramp. If VBAT collapses first, the pack path (IR drop, protection FETs, wiring) is the prime suspect; if VBAT is stable but 5V/SoC dips, the DC/DC transient response or UVLO threshold is limiting. Confirm with reset-reason flags.
2) It docks successfully but charging stops after 1–3 minutes—what three status bits/logs matter most? (→H2-9)
Focus on the three gates that commonly terminate charge early: (1) NTC qualification (temperature window), (2) input validity/DPM or UVLO (dock supply collapse or contact bounce), and (3) safety-timer or fault termination. Pair those logs with a Vdock/Idock trace to see whether charging stops because the input disappears, the pack is disqualified, or a protection timer/fault triggers.
3) It can move on hard floor but stalls on carpet—how to tell traction limit vs current-limit mis-tune? (→H2-4/H2-11)
Use a three-signal split: wheel current, wheel speed/encoder, and driver OCP/fault flags. If current is “flat-topped” or clamps early while speed drops, current-limit/OCP blanking is too aggressive. If current rises normally yet speed still collapses and temperatures climb, mechanical load (hair, bearing friction, brush drag) is dominating. Validate with a no-load current signature baseline from production tests.
4) Navigation drifts after 10–20 minutes—what evidence separates timing drift vs IMU thermal drift? (→H2-7/H2-6)
Separate “timebase problems” from “sensor physics” using two trends: timestamp/frame interval stability and IMU temperature vs bias indicators. If frame intervals jitter, drop counts rise, or sensor timestamps desynchronize under motor activity, it points to timing/interrupt/rail-noise coupling. If timing is stable but IMU readings drift monotonically with temperature (no saturation events), the root cause is thermal drift or calibration asset mismatch rather than synchronization.
5) Cliff sensors false-trigger on dark/reflective floors—what hardware checks come before algorithm tweaks? (→H2-6/H2-10)
Before touching thresholds, confirm the hardware margin: (1) emitter current stability (no droop during motor PWM edges), (2) raw receiver level vs ambient across representative floor materials, and (3) sensor-rail ripple correlation with false triggers. If false events align with rail ripple or motor switching states, focus on return paths, filtering, and ESD/EMI entry points. Also check optical window contamination and sensor alignment tolerance.
6) One wheel motor runs hotter than the other—what’s the fastest way to prove mechanical load vs driver loss? (→H2-4/H2-12)
The fastest split test is a channel swap: swap left/right motor connections (or driver channels) and see whether the heat follows the motor/mechanics or stays with the driver channel. Then compare no-load current signatures at the same speed and measure driver-case temperature. If current rises and heat follows the motor, friction or bearing load dominates; if heat stays with the channel, driver losses or layout/thermal path are suspect.
7) Camera occasionally drops frames—what to inspect on MIPI/power first? (→H2-6/H2-8)
Start with MIPI integrity counters (ECC/CRC or link error indicators) and correlate them with camera rail glitches. If MIPI errors spike without rail movement, suspect signal integrity (connector, routing, ESD device capacitance, return path). If errors align with rail dips or resets, suspect PMIC transient response or power sequencing. A quick A/B is to repeat the test with motors disabled and then enabled to expose coupling paths.
8) Docking success rate falls over weeks—how to confirm contact resistance and alignment drift? (→H2-9/H2-11)
Confirm degradation with two trends: a contact resistance proxy (known current and measured contact drop) and a tolerance sweep (success rate vs small positional offsets). If Rc drifts upward and contact heating rises during charge, oxidation/wear or spring fatigue is dominant. If Rc is stable but the success curve shifts, mechanical alignment or dock geometry drift is likely. Always pair with Vdock/Idock captures to spot micro-dropouts.
9) ESD events cause “sensor not found” until reboot—what reset/isolation strategy prevents permanent latch-up? (→H2-10/H2-8)
Prevent “stuck until reboot” by ensuring the sensor island can be independently recovered: provide a load switch to power-cycle the sensor rail, expose a dedicated reset line, and keep ESD return paths short and low-inductance. If the issue is I²C/SPI bus lock, add bus-recovery support (clock pulses and re-init) plus targeted ESD clamps near the entry. Confirm with ESD hit-point mapping and post-hit rail/bus status snapshots.
10) Battery runtime decays quickly—how to separate cell aging vs fuel-gauge mis-learning? (→H2-3)
Use a controlled load step and compare VBAT sag + recovery against the reported capacity/SoC behavior. True cell aging shows larger sag and slower recovery at the same load (higher effective internal resistance), especially at low temperature. Gauge mis-learning typically shows inconsistent SoC jumps or capacity estimates that do not match coulomb-counted discharge. Capture pack temperature, protection flags, and a repeatable discharge profile before changing any learning parameters.
Note: MPNs are representative examples used to anchor evidence-oriented discussion. Final selection must match pack voltage, current, thermal limits, EMC constraints, and availability.