diff --git a/docs/clearpilot/park-mode-power-savings.md b/docs/clearpilot/park-mode-power-savings.md new file mode 100644 index 0000000..cb50907 --- /dev/null +++ b/docs/clearpilot/park-mode-power-savings.md @@ -0,0 +1,358 @@ +# Park-mode power savings — design notes & retro + +## Goal + +While ignition is on but the car is in **park**, reduce CPU/power draw by +shutting down processes that aren't needed for "watching the gear lever." +The constraint: the car's steering ECU must keep seeing the openpilot +heartbeat (LFA / LKAS / LKAS_ALT CAN-FD messages, every controlsd cycle) +or it drops out of tester mode and throws a steering fault on the next +shift to drive. + +Behavior should approximate "ignition off" except: +- controlsd stays alive (it's the heartbeat source and the gear watcher) +- pandad / boardd / camerad / dashcamd / gpsd / ui / thermald keep running +- everything else (modeld, locationd, paramsd, torqued, calibrationd, + plannerd, radard, dmonitoringmodeld, dmonitoringd, soundd, sensord, + speed_logicd, frogpilot_process) gets paused + +When the car shifts back to drive, all the paused processes spin up cold +and openpilot resumes normal operation. + +## Status: reverted + +We attempted this on **2026-05-04** (commits `5d18ad1` → `1e4e95c`, +force-pushed away). It built and launched fine on the bench but produced +a cascade of false alerts during the park→drive transition in the car — +"controlsd unresponsive", commIssue with longitudinalPlan/frogpilotPlan, +locationd temporaryError, sensorDataInvalid. Reverted to tag +`stable_5_4_26` (commit `0d1cedd`). + +This document captures the design we tried and the problems we ran into +so the next attempt doesn't relearn the same mistakes. + +## Design we tried + +### 1. The flag + +A new memory param `ParkMode` (registered in `common/params.cc`, +defaulted to `"0"` in `manager_init`'s memory-params loop). Lives at +`/dev/shm/params/d/ParkMode`. Written by controlsd, read by +process_config gating callbacks. + +### 2. controlsd writes the flag and runs a minimal cycle in park + +In `__init__`: +```python +self.park_mode = False +self.park_exit_frame = -1 +self.startup_complete_frame = -1 +self.PARK_GRACE_MAX_FRAMES = int(15.0 / DT_CTRL) # see "Park-grace cap" below +self.PARK_STARTUP_DELAY_FRAMES = int(10.0 / DT_CTRL) # see "Startup delay" below +self.last_engaged_frame = -1 +self.last_engage_attempt_frame = -1 +self.POST_ENGAGE_LAG_GRACE_FRAMES = int(3.0 / DT_CTRL) +``` + +In `step()` after `data_sample()`: +```python +self._update_park_mode(CS) +if self.park_mode or self._in_park_exit_grace(): + self._park_mode_tick(CS) + self.CS_prev = CS + return +# ... normal flow +``` + +Helpers: +```python +def _park_mode_allowed(self): + # Don't allow park mode until init has completed AND we've run at + # least PARK_STARTUP_DELAY_FRAMES of normal step() after init. + if not self.initialized: + return False + if self.startup_complete_frame < 0: + self.startup_complete_frame = self.sm.frame + return (self.sm.frame - self.startup_complete_frame) >= self.PARK_STARTUP_DELAY_FRAMES + +def _update_park_mode(self, CS): + if not self._park_mode_allowed(): + return + in_park = CS.gearShifter == GearShifter.park + if in_park != self.park_mode: + self.park_mode = in_park + self.params_memory.put_bool("ParkMode", in_park) + if not in_park: + self.park_exit_frame = self.sm.frame + +def _in_park_exit_grace(self): + if self.park_exit_frame < 0: + return False + if (self.sm.frame - self.park_exit_frame) >= self.PARK_GRACE_MAX_FRAMES: + return False + return not (self.sm.all_checks() and not self.card.can_rcv_timeout) + +def _park_mode_tick(self, CS): + # Build a do-nothing CarControl; card.controls_update -> CI.apply -> + # CarController.update appends create_steering_messages() + # unconditionally, so the LFA/LKAS heartbeat keeps flowing. + CC = car.CarControl.new_message() + self.clearpilot_state_control(CC, CS) + self.card.controls_update(CC, self.frogpilot_variables) +``` + +Why every piece matters: +- **`_park_mode_allowed` (10 s startup delay)**: `card.initialize()` is + only called inside `data_sample`'s `if not self.initialized` branch. + That call wires `CarInterface` up so `controls_update` actually + produces CAN sends. If we entered park-mode before init completed, + the heartbeat was a silent no-op. The 10 s buffer also lets all + `only_onroad_active` processes complete their cold-spawn the very + first time, before manager starts pausing them. +- **`_in_park_exit_grace`**: when ParkMode flips off, the cold-spawn + chain takes time. Stay in keepalive-tick mode until SubMaster reports + everything healthy, with a hard cap as a safety net (we used 15 s, + saw it still wasn't enough — see "What broke" below). +- **`_park_mode_tick` calls `card.controls_update`**: don't call + `state_update` or `data_sample` — `step()` already did that. Just + publish the empty CC to push the heartbeat CAN messages. + +### 3. Manager gating + +In `selfdrive/manager/process_config.py`: + +```python +def _park_mode() -> bool: + return Params("/dev/shm/params").get_bool("ParkMode") + +def only_onroad_active(started, params, CP): + return started and not _park_mode() + +def driverview_active(started, params, CP): + return driverview(started, params, CP) and not _park_mode() + +def always_run_unless_parked(started, params, CP): + # Same as always_run, but pauses while ignition is on and parked. + # Preserves offroad behavior. + return not (started and _park_mode()) +``` + +Re-gated: +- `only_onroad` → `only_onroad_active`: modeld, sensord, soundd, + locationd, calibrationd, torqued, paramsd, plannerd, radard, + speed_logicd +- `driverview` → `driverview_active`: dmonitoringmodeld, dmonitoringd +- `always_run` → `always_run_unless_parked`: frogpilot_process + +`controlsd` stays plain `only_onroad` — it's the writer + heartbeat source. +`camerad`, `pandad`, `thermald`, `ui`, `dashcamd`, `gpsd`, `deleter`, +`fleet_manager`, `tombstoned`, `timed`, manager internals stay +`always_run` (or whatever they were). + +### 4. Engage-attempt grace (separate from park-grace) + +The state transition into engaged briefly bumps the controlsd loop time +over budget. `self.rk.lagging` flips True for a cycle, `update_events` +adds `EventName.controlsdLagging`, which has an `ET.NO_ENTRY` alert +("Controls Process Lagging: Reboot Your Device"). That alert fires +right as the user is taking their hands off the wheel after pressing +SET. Same goes for `commIssue` if any service is briefly stale. + +Solution: track two engagement edges — the actual transition into +ENABLED (post-success) and the cruise.enabled rising edge (pre-success, +covers blocked attempts): + +```python +# In __init__: +self.last_engaged_frame = -1 +self.last_engage_attempt_frame = -1 + +# In update_events, BEFORE the lag/comm checks: +if CS.cruiseState.enabled and not self.CS_prev.cruiseState.enabled: + self.last_engage_attempt_frame = self.sm.frame + +# At the engaged-state transition (state_transition): +enabled_prev = self.enabled +self.enabled = self.state in ENABLED_STATES +if self.enabled and not enabled_prev: + self.last_engaged_frame = self.sm.frame + +# Gate logic in update_events: +in_engage_grace = ( + (self.last_engaged_frame >= 0 + and (self.sm.frame - self.last_engaged_frame) < self.POST_ENGAGE_LAG_GRACE_FRAMES) + or + (self.last_engage_attempt_frame >= 0 + and (self.sm.frame - self.last_engage_attempt_frame) < self.POST_ENGAGE_LAG_GRACE_FRAMES) +) +if not REPLAY and self.rk.lagging and not in_engage_grace: + self.events.add(EventName.controlsdLagging) +# ... and similarly suppress commIssue when in_engage_grace +``` + +### 5. Fan range rules (port from broken tree) + +Independent of park-mode plumbing. Widen the fan PID range when the car +is parked so the device can fully cool while the user can't hear road +noise. + +`selfdrive/thermald/fan_controller.py`: +```python +def update(self, cur_temp, ignition, standstill=False, is_parked=True, cruise_engaged=False): + # neg_limit = -max_fan_pct, pos_limit = -min_fan_pct + if not ignition: + self.controller.neg_limit = -30 + self.controller.pos_limit = 0 + elif is_parked: + self.controller.neg_limit = -100 + self.controller.pos_limit = 0 # 0-100% (full range) + elif cruise_engaged: + self.controller.neg_limit = -100 + self.controller.pos_limit = -30 # 30-100% + elif standstill: + self.controller.neg_limit = -100 + self.controller.pos_limit = -10 # 10-100% + else: + self.controller.neg_limit = -100 + self.controller.pos_limit = -30 # 30-100% +``` + +`selfdrive/thermald/thermald.py`: +```python +# Add carState to the SubMaster service list +sm = messaging.SubMaster([..., "carState"], poll="pandaStates") + +# At the fan_controller.update call site: +if sm.seen['carState']: + cs = sm['carState'] + standstill = cs.standstill + is_parked = cs.gearShifter == car.CarState.GearShifter.park +else: + standstill = False + is_parked = True # default safe: assume parked, no fan floor +cruise_engaged = sm['controlsState'].enabled if sm.seen['controlsState'] else False +msg.deviceState.fanSpeedPercentDesired = fan_controller.update( + all_comp_temp, onroad_conditions["ignition"], standstill, + is_parked=is_parked, cruise_engaged=cruise_engaged) +``` + +This piece worked fine standalone — it's safe to land on its own without +the rest of the park-mode plumbing. (It does require the thermald +carState subscription to actually receive carState — which is fine in +normal operation.) + +## What broke + +### A. card.initialize() not called → silent CAN + +First attempt entered park-mode immediately whenever `gearShifter == +PARK`, including before controlsd had ever passed through its +`if not self.initialized` branch. Symptoms: +- UI stuck on splash even after shift to drive +- carState never published to cereal (tested by subscribing externally) +- presumably the steering ECU was getting no heartbeat, though we + hadn't yet tried to engage + +Fix: `_park_mode_allowed` gate, requires `self.initialized` AND a 10 s +post-init buffer. **Critical.** Don't skip this in v2. + +### B. Park-grace cap (8 s) too short + +Cold-spawn chain length: +1. modeld load thneed weights, init GPU (~3-5 s) +2. modeld publishes modelV2 +3. plannerd consumes modelV2, publishes longitudinalPlan +4. frogpilot_process consumes modelV2, publishes frogpilotPlan +5. paramsd, torqued, calibrationd estimators accumulate enough samples + to set valid=True + +In testing this cumulatively exceeded 8 s. We bumped the cap to 15 s. +**Even 15 s wasn't enough in some test runs** — got commIssue alerts +with longitudinalPlan / frogpilotPlan still invalid past the cap. + +Possible fixes for v2: +- Even longer cap (30 s) +- Or condition-only with no cap (live with the risk that a genuinely + broken service strands controlsd in keepalive forever) +- Or **a different approach**: don't actually pause modeld; throttle it + to 1 fps when stopped (this is what the broken tree's CLAUDE.md + pending-features list described as "modeld throttled to 1fps when + stopped"). Avoids the cold-spawn entirely. + +### C. NO_ENTRY alerts fire on engage attempt, not engage success + +`controlsdLagging` and `commIssue` both have `ET.NO_ENTRY`. NO_ENTRY +alerts fire when the user *tries* to engage but the event blocks them. +At that moment `self.enabled` hasn't flipped yet, so a grace keyed on +`last_engaged_frame` is inert. + +Fix: also track `CS.cruiseState.enabled` rising edge as +`last_engage_attempt_frame`. See "Engage-attempt grace" above. + +### D. Cascade of related alerts + +After all the above were patched, in-car testing still produced +"locationd temporaryError" and "sensorDataInvalid" alerts after the +post-engage grace window expired. We didn't have time to chase these +down before reverting. Hypotheses: +- locationd's kalman filter needs more than ~15 s to converge after + cold spawn, especially without GPS feeding it (we explicitly skip + GPS in locationd). +- sensord might lose its IMU sample lock and need re-init when + killed/restarted. + +These are an argument for the "throttle, don't kill" approach in v2. + +### E. Sensor data invalid + +`sensorDataInvalid` likely stems from sensord being killed and +restarted. The IMU init handshake takes time, and during that window +`accelerometer` / `gyroscope` services are alive but their data hasn't +stabilized — locationd reports invalid, controlsd alerts. + +Same fix family as (D). + +## Recommended approach for v2 + +The "kill processes" approach is architecturally clean but creates a +big cold-spawn cliff every time the user shifts to drive. The v2 plan +should probably look more like: + +1. **Throttle, don't kill.** Send a "standby" memory param that modeld + reads and reduces its tick rate to 1 fps. dmonitoringmodeld likewise. + Plannerd/radard naturally adapt to slower modelV2 input. paramsd / + torqued / calibrationd / locationd keep accumulating slowly. +2. **Keep controlsd's main loop running normally** — no park_mode_tick. + The state machine already handles "not engaged" → no actuators → + passive. The heartbeat flows naturally because `state_control` / + `publish_logs` run every cycle. +3. **Apply the fan-range widening (Section 5 of "Design")** — that + piece is independently valuable and doesn't depend on the rest. +4. **Skip dashcamd touchpoints.** dashcamd is already gear-aware via + its own state machine and pauses recording in park naturally. +5. **Keep the `_park_mode_allowed` startup-delay gate concept** if any + form of conditional-shutdown is reintroduced — this guards against + `card.initialize()` being skipped. + +### Files involved (for reference) + +| File | Why | +|---|---| +| `common/params.cc` | register `ParkMode` (or `ModelStandby` etc.) | +| `selfdrive/manager/manager.py` | `manager_init` memory-param defaults | +| `selfdrive/manager/process_config.py` | gating helpers | +| `selfdrive/controls/controlsd.py` | the park branch and engage-grace | +| `selfdrive/thermald/fan_controller.py` | fan range rules | +| `selfdrive/thermald/thermald.py` | carState sub for fan rules | +| `selfdrive/modeld/modeld.py` (v2) | throttle on standby flag | +| `selfdrive/modeld/dmonitoringmodeld.py` (v2) | throttle on standby flag | + +### Reference commits (force-pushed away from origin/clearpilot, but the +text of this document captures the substance) + +- park-mode initial: bring up flag + gating + controlsd keepalive +- park-mode startup gate: 10 s post-init delay before allowing park +- engage-grace: 3 s suppression on engage edge for controlsdLagging +- engage-attempt grace + grace cap bump to 15 s + +These were all on top of `stable_5_4_26` (`0d1cedd`).