Sanding & Anti-Slip Control for Rail Rolling Stock
← Back to: Rail Transit & Locomotive
This article covers the design, testing, and maintenance strategies for sanding and anti-slip systems in rail transit. It addresses key topics such as sensor integration, actuator control, slip detection logic, and evidence-based diagnostics. Through detailed analysis and actionable guidelines, the content equips engineers with the knowledge to ensure reliable system performance, minimize false alarms, and maintain long-term operational integrity.
H2-1. Scope & Boundary: What this page covers and does not
This page focuses on Sanding & Anti-Slip as an onboard adhesion-assist function: detect slip/slide, command sanding, drive the valve/motor safely, and produce auditable evidence (events, counters, and trigger context). The goal is not to “describe rail systems” but to define an implementable, testable boundary with clear interfaces and evidence outputs.
Primary deliverables of this page: (1) a minimal interface definition, (2) a trustworthy sensor-to-trigger path, (3) actuator drive + diagnostics, and (4) an evidence packet structure that survives EMC and power transients.
Upstream inputs (interface level only):
- Wheelset sensing: axle speed sources such as encoder/VR/Hall (handled here only as conditioned inputs and quality flags).
- Motion confirmation: acceleration channel(s) to suppress false positives and to validate rapid changes.
- Commands: driver/TCMS sanding enable/disable and mode selection (no TCMS internal architecture here).
- Traction/brake requests: “request/state lines” only (no traction inverter or brake controller internal algorithms).
System outputs (what this page owns):
- Actuation: sand valve drive and/or auger/feeder motor control, including safe defaults and diagnostics.
- Dosing primitives: commanded duty/level with speed-aware limiting and anti-chatter behavior.
- Evidence: event triggers, fault codes, counters, and a compact record suitable for post-incident review.
Environmental constraints (rail-specific):
- EN 50155: wide input variations, undervoltage dips, temperature extremes, and restart behavior that must not corrupt evidence.
- EN 50121: strong EMC where false triggers often originate from common-mode coupling and command-line glitches.
Explicit non-scope (to prevent overlap):
- Traction inverter power stage design, IGBT/SiC gate-driver deep design, or DC-link control internals.
- Brake control unit internal closed-loop algorithms and pneumatic/hydraulic control details (only interface-level signals are referenced).
- CBTC/ETCS signaling, passenger systems, wayside/station systems, and traction substations.
H2-2. User Intent & Failure Narrative: What engineers actually search for
Sanding and anti-slip functions are rarely “tuned for comfort.” They are deployed to control risk under low adhesion (rain, snow, leaf film, oil contamination) while keeping the system measurable and defensible. Real-world searches cluster around a small set of failure narratives. Each narrative should map to: mechanism → evidence fields → first corrective action.
Narrative A — Slip detection is unreliable (false positives or missed events).
- Likely mechanism: speed input quality collapses (jitter, dropout, saturation), or detection lacks stable hysteresis/gating.
- Evidence to capture:
speed_quality_flag,jitter_est,dropout_count,enter_reason,exit_reason. - First fix: enforce quality gating (do not confirm slip when the input is not trustworthy), then widen hysteresis to prevent chatter.
Narrative B — Sanding “works,” but slip persists.
- Likely mechanism: actuator response is slow or dosing is not matched to speed and conditions (insufficient delivery, nozzle clog, valve stick).
- Evidence to capture:
cmd_duty,response_ms,actuator_current_peak,jam_flag,sand_rate_est. - First fix: verify response time and current signature on a confirmed command; if response is late/abnormal, treat it as delivery failure before retuning thresholds.
Narrative C — Sanding triggers, but causes secondary issues (overuse, contamination, nuisance).
- Likely mechanism: triggers are driven by EMC/power glitches or overly permissive gating; the state machine is being pushed by spurious edges.
- Evidence to capture:
input_glitch_count,cm_noise_flag,brownout_counter,reset_reason,threshold_version. - First fix: tighten gating to require persistence and context (speed band + traction/brake state + quality flag), then address EMC coupling paths.
Narrative D — Audit/maintenance requires proof (post-incident defensibility).
- Likely mechanism: the system cannot prove “what happened” because triggers lack context, timestamps are untrusted, or pre/post buffers are missing.
- Evidence to capture:
timestamp_source,event_id,pre_window_s,post_window_s, plus the detection state and actuator feedback. - First fix: implement a minimal evidence packet with trigger context and timebase health; tuning is secondary if proof is not reproducible.
A practical rule: treat “mechanism guessing” as a last resort. A reliable anti-slip system should narrow each narrative to a short list of evidence checks and a single first corrective action.
H2-3. Architecture Decomposition: Modules to implement
A sanding and anti-slip function becomes reliable only when it is decomposed into explicit modules with interface contracts. Each module below lists: inputs, outputs, isolation boundary, and diagnostic/evidence fields. Traction, brake, and TCMS are referenced only as interface-level signals.
Module A — Sensor front-end (speed + acceleration)
Module B — Slip detection core (score + gating + hysteresis)
Module C — Actuator control (valve / motor drive + sensing)
Module D — Safety & interlocks (inhibit + fail-safe)
Module E — Event logging (trigger rules + evidence packet)
Module F — Communications (diagnostic interface only)
A practical implementation pattern: treat evidence generation as a first-class module. When slip detection, actuation, and logging share a clear contract, false triggers become diagnosable rather than “mysterious.”
H2-4. Speed & Acceleration AFE Design: Make slip detection trustworthy
Slip/slide decisions are only as trustworthy as the speed and acceleration inputs. This section focuses on interface-level signal conditioning, estimator selection, and the generation of quality flags that gate detection. When input quality is unknown, detection should downgrade to “suspect” rather than confirm and trigger sanding.
4.1 Speed input families (interface behavior)
- VR (variable reluctance): amplitude changes with speed; low-speed edges are fragile and often require adaptive thresholds.
- Hall/MR: digital-like edges; sensitive to wiring reference and EMC spikes that can look like valid pulses.
- Encoder: high pulse density; robust at medium/high speed, but low-speed bounce and missing edges can create false micro-speed.
4.2 Conditioning primitives (noise immunity)
- Schmitt / hysteresis shaping: suppresses edge chatter and short spikes that would inflate speed.
- Adaptive thresholding (VR): tracks amplitude changes but must avoid “following noise” under strong EMI.
- Debounce windows: enforce a minimum edge interval to reject impossible pulse rates.
4.3 Speed estimation choice: period capture vs edge count
Two estimators dominate rail speed capture: period capture performs better at low speed where few edges exist, while edge counting stabilizes at high speed. A stable system defines an estimator_mode and uses a switch hysteresis band to avoid mode-chatter.
4.4 Low/zero speed reliability
- Signal-present gating: explicitly track
signal_presentanddropout_countrather than assuming missing edges imply zero speed. - Zero-speed hold: define
zero_speed_hold_msto avoid rapid flip-flopping when edges vanish at very low speed. - Minimum edge interval: reject physically impossible pulses using
min_edge_intervalas a hard filter.
4.5 Acceleration as a truth-check
Acceleration is used as a consistency check: if speed-derived changes are not consistent with acceleration trends, the input quality is downgraded. This reduces nuisance confirmations caused by track vibration, wiring motion, or EMC spikes on speed edges.
Minimum evidence fields produced by AFE: speed_raw, speed_filtered, accel_raw, quality_flag, signal_present, jitter_est, dropout_count, estimator_mode.
H2-5. Slip/Slide Detection Logic: Thresholds, hysteresis, and gating
Slip/slide triggering should not be treated as a single threshold. A reliable design uses state progression plus evidence fields: input quality gating, context gating (speed band and traction/brake state), hysteresis, and multi-axle consistency checks. This produces a defensible record of why the system entered and why it exited.
5.1 Definitions (what “slip/slide” means at interface level)
- Slip score: derived from Δv and/or dv/dt between wheel speed and a reference speed.
- Reference speed: selected from trusted wheel inputs (e.g., neighbor/median) under quality constraints; output
ref_source_id. - Direction awareness: traction state and brake state are used as context (interface-only) to interpret sign and severity.
5.2 Gating (when detection is allowed to confirm)
Gating prevents nuisance confirmations during low-speed ambiguity, sensor quality collapse, and rail EMC/power disturbances.
A practical approach is to compute a gating_mask and a gating_block_reason that can be logged.
- Speed band gating: disable confirmation below a defined speed floor; allow “suspect” only.
- Context gating: require consistent traction/brake state lines (signal only) before confirmation.
- Quality gating: block confirmation when
quality_flagis LOW; keep state at SUSPECT. - Health gating: block confirmation on brownout/overtemp conditions; record
health_okandreset_reason.
5.3 Hysteresis (avoid chatter and edge-driven triggers)
- Enter/exit separation: use distinct enter and exit thresholds to avoid boundary oscillation.
- Minimum dwell time: require persistence (
enter_dwell_ms) to enter CONFIRMED and persistence (exit_dwell_ms) to leave it. - Reason codes: log
enter_reasonandexit_reasonas first-class evidence, not debug text.
5.4 Multi-axle consistency (reduce false positives)
A single axle reporting a severe slip score while neighbor axles remain stable and high-quality often indicates sensor corruption rather than true adhesion loss.
Consistency checks should influence confidence and can be encoded as neighbor_consistency_ok.
Recommended logging rule: every state transition must produce a small record containing slip_score + confidence + gating_mask + enter/exit_reason. Without these fields, tuning becomes guesswork and post-incident review becomes inconclusive.
H2-6. Actuator Control: Valve/motor drive, dosing, and jam detection
Actuator control must convert a detection outcome into a measurable delivery action: command level, response timing, and current signature. This section covers valve and small motor actuation at the interface level, including protection, dosing curves, and jam/open detection without expanding into onboard PDU or brake controller internals.
6.1 Actuator types (implementation primitives)
- Solenoid valve: PWM with “pull-in” and “hold” behavior; current signature is the primary evidence of motion.
- Feeder motor / auger: duty-based control with current-based stall detection; advanced motor control is out of scope.
6.2 Drive protection and diagnosability
- Overcurrent/short: trip fast, latch per policy, record
overcurrent_trip_reason. - Open/line break: command present but current signature missing; raise
open_short_flag. - Overtemperature: inhibit or derate; record
inhibit_reasonandtemp_ok. - Back-EMF / harness stress: clamp and protect; evidence should not be lost during transient handling.
6.3 Dosing control (speed-aware curve + limits)
A dosing strategy should be defined as a curve (or table) tied to operating context rather than a fixed duty. Typical rail constraints include low-speed compensation, high-speed limiting, and sand-rate budgeting to prevent overuse.
- Low-speed compensation: stabilize delivery when speed estimation is sparse and adhesion changes rapidly.
- High-speed limiting: cap sand rate to reduce contamination and excessive consumption.
- Rate limiting / cooldown: prevent repeated triggers from exhausting sand; log
rate_limit_active.
6.4 Jam / clog detection (current signature + response time)
Jam detection is most robust when it uses two independent cues: current signature classification and response time window. This separates normal motion, mechanical stall, and open-circuit behavior.
Logging rule: on every trigger, record cmd_duty, drv_current_peak, and response_ms. If sanding “was commanded” but these fields do not confirm action, the system should treat it as a delivery failure rather than tuning a detection threshold.
H2-7. Coordination Interfaces: How sanding interacts with traction/brake (signals only)
This section defines the coordination contract between the sanding/anti-slip function and upstream traction/brake systems using signals only. It avoids traction inverter and brake unit internal algorithms and focuses on interface reliability: priority rules, debounce under EN 50121 disturbance, and evidence-friendly outputs.
7.1 Interface I/O (contract-level signals)
7.2 Priority model (safety inhibition first)
Coordination is safest when inhibition is explicit and enumerable. The output inhibit_reason should be treated as a primary
interface signal, not an internal detail, and it should be logged with event_id when it changes.
- Safety inhibition: maintenance mode, diagnostic failure, supply/temperature unhealthy, or system-wide inhibit lines.
- Interlock inhibition: upstream indicates “do not sand” for the current context (signal only).
- Resource inhibition: cooldown/limit strategy active to prevent excessive consumption; report as a reason code.
7.3 Debounce and consistency (EN 50121 anti-glitch)
Long harnesses and strong common-mode noise can create short pulses that appear as valid status transitions. Interface protection should therefore combine debounce windows and consistency checks before state acceptance.
- Debounce window: accept traction/brake state changes only after stability for
if_debounce_ms. - Glitch counters: record
if_glitch_countand optionalif_inconsistency_flagwhen conflicting states occur. - Consistency gating: if traction_active and brake_active are simultaneously asserted in an invalid context, block confirmation and log a reason.
Recommended contract rule: sanding_active should represent the real “action-active” state (confirmed command + verified response), not just a software request. This prevents upstream systems from assuming delivery when actuation failed.
H2-8. Event Triggers & Black-Box Logging: Evidence that survives audits
Black-box logging is the page’s differentiator: it turns detection and actuation into auditable evidence. A practical design uses enumerable trigger types, fixed pre/post windows, and a compact evidence packet that survives resets. Cryptographic integrity is referenced at the interface level only (hash/signature presence) without expanding into security architecture.
8.1 Trigger taxonomy (enumerable events)
- slip_confirmed — detection reached CONFIRMED (include slip_score, confidence, gating_mask).
- sanding_cmd — command issued (include cmd_duty, curve_id, cooldown/rate-limit status).
- actuator_fault — jam/open/overcurrent/overtemp (include trip_reason, jam_flag, open_short_flag).
- sensor_quality_drop — quality gating degraded (include quality_flag, dropout_count, jitter_est).
- brownout_reset — reset/power dip (include supply_v summary, reset_reason, counters).
8.2 Pre/post windows (why they exist)
Pre/post windows provide context: pre-window reveals whether input quality or supply conditions degraded before the trigger, and post-window confirms whether actuation produced a measurable response and whether recovery occurred. Windows should be configurable and recorded as configuration evidence.
8.3 Minimum evidence set (audit-friendly layers)
8.4 Integrity hooks (interface-level only)
If a security module is present, the event packet can include a hash and a signature status field. This provides tamper-evidence without describing key management or remote attestation flows on this page.
Implementation rule: event commit should be designed to survive resets. If a brownout occurs during commit, the system should record commit_status and preserve the last valid event pointer rather than silently dropping evidence.
H2-9. Power Integrity & EMC in Rail: Why false triggers happen
False sanding triggers typically come from a repeatable cause chain: noise source → DM/CM coupling path → victim point (speed AFE / command input / MCU/logger) → observable fields. Rail environments amplify these effects through long harnesses, high-energy switching, and strong common-mode disturbances.
9.1 Voltage dips and transients (how they break detection)
- Threshold drift: references and comparators shift under dips, producing artificial spikes and saturation at the sensor interface.
- Reset and timing rupture: MCU resets or clock instability corrupt dwell time, debounce windows, and timestamps.
- Record integrity risk: event commit can be interrupted, creating missing post-windows unless commit status is recorded.
9.2 DM vs CM coupling (where noise travels)
DM coupling follows a signal-return loop (input pair + return), while CM coupling rides the harness relative to chassis and seeks a return through shield termination and parasitic capacitance. Both can corrupt slip confirmation and interface debouncing.
- DM victims: speed AFE thresholds, edge capture, and quality gating can be displaced by return-path disturbance (ground bounce).
- CM victims: command/status inputs and isolated interfaces can see short pulses as valid edges unless common-mode currents are controlled.
9.3 Suppression strategies (placement + partitioning)
- Filter placement: apply input conditioning at the receiver entry (speed AFE and command input) before decision logic.
- Common-mode control: provide a defined CM return to chassis via correct shield termination; avoid routing CM through AFE reference.
- Partitioning: separate sensitive AFE reference and high dI/dt actuator return; reduce ground bounce injection into detection.
- Clamping & saturation handling: detect and flag saturated inputs rather than letting them masquerade as valid slip evidence.
Recommended “false-trigger suspicion” rule: if slip_confirmed aligns with reset_reason, rising if_glitch_count, or input_saturation_flag, treat the episode as an EMC/power integrity incident and log it explicitly instead of tuning detection thresholds.
H2-10. Diagnostics & Health Monitoring: Counters that enable maintenance
Maintenance value comes from counters and trends, not one-time functionality. A clear health model groups signals by subsystem, defines “redline” concepts (without disclosing proprietary numbers), and ties every counter to an actionable inspection step. Reporting is described as a minimal diagnostic set rather than a full TCMS stack.
10.1 Counter groups (dashboard-ready)
10.2 “Redline” concepts (thresholds without numbers)
- Rising dropout density suggests harness/shield/connector issues rather than detection tuning.
- Jam and overcurrent frequency suggests mechanical resistance, nozzle clogging, or insulation degradation.
- Sand usage vs slip rate mismatch suggests dosing curve issues or false triggering under EMC conditions.
- Commit failures and brownouts suggest power integrity and hold-up weaknesses that jeopardize audit evidence.
10.3 Trend trust (version & calibration IDs)
Trends are only comparable when the system reports which configuration produced them. Maintenance reviews should therefore capture: detection threshold version, calibration/configuration ID, and confidence trend summaries.
10.4 Minimal reporting interface (no full-stack expansion)
- Read-only summary: counters + trust fields via registers/diagnostic frames.
- Event index readout: fetch event summaries by
event_idwithout pulling full raw streams. - Controlled reset/clear: counters cleared only in maintenance mode, and the clear action is logged as an event.
Recommended maintenance linkage: when false_trigger_suspected crosses a redline concept, review the last N events for input_saturation_flag, if_glitch_count, and reset_reason before altering thresholds.
H2-11. Validation Playbook: What to measure and how to prove it works
This playbook is an executable checklist. Every test item must produce
traceable evidence fields and/or an event packet linked by event_id.
The goal is not only “it works,” but “it can be proven during audits and maintenance reviews.”
11.1 Bring-up checklist (inputs + safe actuation)
Bring-up validates full-range sensor ingestion and actuator protection before any slip logic tuning.
Each case should record test_case_id, a short configuration fingerprint, and the resulting evidence fields.
- Speed input coverage: low-speed boundary, high-speed boundary, missing tooth/dropouts, jitter/glitch pulses, reduced amplitude, saturation/clamp.
- Acceleration sanity: vibration-like bursts must not be misread as slip evidence when gating is active.
- Actuator safety: open/short detection, overcurrent, overtemperature, jam signature, and response timing window.
Example MPNs (bring-up instrumentation & interfaces)
11.2 Slip simulation (controlled injection + state stability)
Slip validation requires controlled stimuli: bench injection, simulation, or replay of captured traces. The objective is stable state transitions with hysteresis and gating—no chattering between states.
- Injection levels: mild / medium / strong slip patterns (relative bins are sufficient).
- State stability:
enter_dwell_msenforced; enter/exit thresholds separated; recovery behavior consistent. - Gating correctness: poor sensor quality or invalid traction/brake context must block CONFIRMED.
Example MPNs (data acquisition / replay / logging blocks)
11.3 EMC & power transient (no false trigger + no evidence loss)
Under rail EMC and supply disturbances, validation must prove two outcomes simultaneously: no false triggers and logs that remain interpretable (even across resets).
- Transient immunity: command inputs and sensor paths must not produce spurious CONFIRMED transitions.
- Reset transparency: resets must be explainable via
reset_reasonandbrownout_counter. - Commit durability: event commit interruptions must be visible via
commit_status/commit_fail_count.
Example MPNs (surge/ESD protection & CM control)
11.4 Field regression (route conditions + audit alignment)
Field regression converts real routes into repeatable test libraries. Each scenario should map to an event packet that supports post-incident questions: when slip occurred, whether sanding actuated, whether inputs were trustworthy, and whether the power/EMC environment was healthy.
- Low adhesion: rain/snow/leaves/oil; ensure sanding_cmd correlates with confirmed slip evidence.
- Vibration hotspots: joints/switches; acceleration bursts must not bypass gating.
- Post-maintenance: nozzle changes; detect jam trends and abnormal sand usage rates.
H2-12. Field Feedback Loop: Update thresholds, models, and triggers from returns
The system should behave as a dynamic model system: field returns are converted into structured evidence, parameter updates are versioned, validated against the matrix, deployed in subsets, and continuously monitored. Logging is not an accessory—it is the core of controlled improvement.
12.1 Return triage (from cases → evidence gaps)
Every return should start from event_id and the evidence packet, not from assumptions.
The first goal is to identify whether the packet is sufficient; if not, the priority is to close evidence gaps before tuning thresholds.
12.2 Threshold strategy (versioning + subset rollout + rollback)
- Versioning: every rule change increments
threshold_versionand is recorded in events. - Subset deployment: enable new versions on a limited fleet/route subset before broad rollout.
- Rollback triggers: rising
false_trigger_suspected, abnormalsand_usage_counter, or increased event density without supporting slip evidence.
12.3 Preventive maintenance windows (trend-driven)
Counters should drive maintenance scheduling. Trend-based triggers reduce unplanned downtime and avoid “threshold chasing” when the root cause is mechanical resistance or power/EMC disturbances.
- Clog/jam trend: increasing
jam_count+ risingdrv_current_peaktrend → clean nozzle/feeder earlier. - Overuse mismatch: high
sand_usage_counterbut low slip evidence density → investigate false-trigger mechanisms. - Evidence risk: rising
brownout_counterorcommit_fail_count→ prioritize power integrity improvements.
Example MPNs (durable storage + integrity hooks)
Operational rule: do not tune detection thresholds until evidence packets can separate sensor quality, interface glitches, and power/EMC incidents. Otherwise, “fixes” can reduce safety margin and create new failure modes.
H2-13. FAQs (Accordion ×12)
Each answer follows the same proof pattern: 1 conclusion + 2 evidence checks + 1 first fix. Chapter mapping is shown under each question.
Conclusion: Most “sanding-but-still-slip” cases are actuator-response limited, not threshold-limited.
Evidence: Compare sanding_cmd with drv_current_peak and response_ms; slow or clipped current suggests a valve/feed restriction (e.g., DRV110 drive never reaches hold profile).
Evidence: Check whether slip_score decays during RECOVERY after sanding; if it stays high, dosing is not reaching the rail.
First fix: Validate valve/feeder response under cold/contamination and tighten jam detection using current signatures (INA240-style sensing).
cmd_duty, drv_current_peak, response_ms, slip_score, stateConclusion: This pattern usually indicates low-speed estimator/conditioning weakness rather than “no slip.”
Evidence: At low speed, inspect signal_present, quality_flag, and dropout_count; edge-capture can fail when pulse amplitude is near threshold.
Evidence: Compare jitter_est and period-capture stability; excessive jitter suggests poor hysteresis/Schmitt conditioning or noise pickup.
First fix: Improve low-speed conditioning and estimator path (threshold + debounce) before adjusting detection thresholds.
signal_present, quality_flag, dropout_count, jitter_estConclusion: EMC false alarms are more often coupling/glitch driven than “too sensitive thresholds.”
Evidence: Correlate alarms with if_glitch_count and input_saturation_flag; spikes implicate CM currents and poor entry filtering/termination.
Evidence: Review enter_reason and state dwell time; if transitions occur without sustained evidence, hysteresis/dwell is not being enforced.
First fix: Add/verify interface debouncing and CM return strategy (e.g., ISO7721-isolated inputs plus proper shield-to-chassis termination).
if_glitch_count, input_saturation_flag, enter_reason, stateConclusion: Field jams are best identified by current signatures and trend counters, not by command logs alone.
Evidence: Compare drv_current_peak waveforms: “debris/icing” shows rising current and slow response, unlike open/short faults (DRV8876 protections can mask symptoms if not logged).
Evidence: Confirm jam_count trending with actuator_cycle_count; repeated near-jam cycles indicate mechanical resistance build-up.
First fix: Tighten jam detection windows and schedule preventive cleaning when current trend crosses a redline.
drv_current_peak, response_ms, jam_count, actuator_cycle_countConclusion: Misaligned timestamps usually come from trigger ordering and reset/time-sync state, not from “bad clocks” alone.
Evidence: Check time_sync_status alongside event timestamp; drift often appears after resets or loss-of-lock.
Evidence: Validate ordering: slip_confirmed should precede sanding_cmd and actuator feedback in the packet; if not, triggers are racing.
First fix: Enforce a single timebase and deterministic trigger ordering before increasing time-sync complexity.
timestamp, time_sync_status, event_type, trigger_orderConclusion: Resets during slip episodes are a power-integrity problem until proven otherwise.
Evidence: Correlate reset_reason with brownout_counter and supply summaries; a supervisor (TPS3839-class) helps make resets explainable.
Evidence: Inspect commit_status/commit_fail_count; silent commit loss breaks auditability even if detection was correct.
First fix: Increase brownout margin/hold-up and harden commit policy so events remain interpretable across resets.
reset_reason, brownout_counter, supply_v_summary, commit_statusConclusion: Oscillation is almost always hysteresis/dwell or gating instability, not “wrong threshold.”
Evidence: Compare enter_reason/exit_reason patterns; rapid toggling indicates missing enter/exit separation or too-short minimum dwell.
Evidence: Inspect gating_mask changes near transitions; if sensor-quality or context bits flap, the state machine will chatter.
First fix: Add enter/exit hysteresis plus minimum duration and stabilize gating inputs before retuning thresholds.
enter_reason, exit_reason, gating_mask, stateConclusion: Excess usage is either over-triggering or missing speed-dependent dose limiting.
Evidence: Compare sand_usage_counter against slip_confirmed density; a high ratio suggests false triggers rather than genuine low adhesion.
Evidence: Verify whether cmd_duty clamps as speed increases; lack of limiting causes runaway consumption even on marginal slips.
First fix: Implement a speed-indexed cap (dosing curve limit) before lowering detection sensitivity.
sand_usage_counter, slip_confirmed, cmd_duty, false_trigger_suspectedConclusion: Intermittent quality loss is either wiring/reference instability or saturation from coupled noise.
Evidence: If quality_flag drops with input_saturation_flag spikes, suspect EMC/CM coupling rather than a pure wiring open.
Evidence: If dropouts increase with brownout_counter or resets, suspect supply/reference disturbance in the AFE path.
First fix: Improve entry filtering and reference/return routing; then revalidate low-speed conditioning.
quality_flag, input_saturation_flag, dropout_count, brownout_counterConclusion: Proof requires command + actuator response + integrity context, not just a “sanding_active” bit.
Evidence: Mandatory chain: event_id, sanding_cmd, cmd_duty, and a physical response indicator like drv_current_peak (INA240-class sensing).
Evidence: Integrity fields like commit_status and threshold_version must accompany the packet to show it is complete and version-traceable.
First fix: Lock a “minimum evidence packet template” and fail-safe log when any mandatory field is missing.
event_id, cmd_duty, drv_current_peak, commit_status, threshold_versionConclusion: Post-update drift is usually version/migration related until evidence shows a real physics change.
Evidence: Compare threshold_version and configuration fingerprints across events; unexpected jumps indicate migration or default resets.
Evidence: Check monitoring counters (false_trigger_suspected, sand_usage_counter) for step changes aligned with the update window.
First fix: Roll back or subset-disable the new version, then re-run the F11 matrix before broad redeploy.
threshold_version, calibration_id, false_trigger_suspected, sand_usage_counterConclusion: Disagreement is a sensor-quality asymmetry first, and a consistency-rule issue second.
Evidence: Compare axle-wise quality_flag, dropout_count, and jitter_est; the weak axle typically drives false inconsistency.
Evidence: Inspect the consistency decision output (multi_axle_consistency_flag) and which gating bits suppressed CONFIRMED.
First fix: Fix the degraded axle input (wiring/reference/conditioning) before relaxing multi-axle rules.
quality_flag, dropout_count, jitter_est, multi_axle_consistency_flag