Compare commits
1 Commits
0d1ceddad2
...
clearpilot
| Author | SHA1 | Date | |
|---|---|---|---|
| 119d412c24 |
@@ -0,0 +1,358 @@
|
||||
# Park-mode power savings — design notes & retro
|
||||
|
||||
## Goal
|
||||
|
||||
While ignition is on but the car is in **park**, reduce CPU/power draw by
|
||||
shutting down processes that aren't needed for "watching the gear lever."
|
||||
The constraint: the car's steering ECU must keep seeing the openpilot
|
||||
heartbeat (LFA / LKAS / LKAS_ALT CAN-FD messages, every controlsd cycle)
|
||||
or it drops out of tester mode and throws a steering fault on the next
|
||||
shift to drive.
|
||||
|
||||
Behavior should approximate "ignition off" except:
|
||||
- controlsd stays alive (it's the heartbeat source and the gear watcher)
|
||||
- pandad / boardd / camerad / dashcamd / gpsd / ui / thermald keep running
|
||||
- everything else (modeld, locationd, paramsd, torqued, calibrationd,
|
||||
plannerd, radard, dmonitoringmodeld, dmonitoringd, soundd, sensord,
|
||||
speed_logicd, frogpilot_process) gets paused
|
||||
|
||||
When the car shifts back to drive, all the paused processes spin up cold
|
||||
and openpilot resumes normal operation.
|
||||
|
||||
## Status: reverted
|
||||
|
||||
We attempted this on **2026-05-04** (commits `5d18ad1` → `1e4e95c`,
|
||||
force-pushed away). It built and launched fine on the bench but produced
|
||||
a cascade of false alerts during the park→drive transition in the car —
|
||||
"controlsd unresponsive", commIssue with longitudinalPlan/frogpilotPlan,
|
||||
locationd temporaryError, sensorDataInvalid. Reverted to tag
|
||||
`stable_5_4_26` (commit `0d1cedd`).
|
||||
|
||||
This document captures the design we tried and the problems we ran into
|
||||
so the next attempt doesn't relearn the same mistakes.
|
||||
|
||||
## Design we tried
|
||||
|
||||
### 1. The flag
|
||||
|
||||
A new memory param `ParkMode` (registered in `common/params.cc`,
|
||||
defaulted to `"0"` in `manager_init`'s memory-params loop). Lives at
|
||||
`/dev/shm/params/d/ParkMode`. Written by controlsd, read by
|
||||
process_config gating callbacks.
|
||||
|
||||
### 2. controlsd writes the flag and runs a minimal cycle in park
|
||||
|
||||
In `__init__`:
|
||||
```python
|
||||
self.park_mode = False
|
||||
self.park_exit_frame = -1
|
||||
self.startup_complete_frame = -1
|
||||
self.PARK_GRACE_MAX_FRAMES = int(15.0 / DT_CTRL) # see "Park-grace cap" below
|
||||
self.PARK_STARTUP_DELAY_FRAMES = int(10.0 / DT_CTRL) # see "Startup delay" below
|
||||
self.last_engaged_frame = -1
|
||||
self.last_engage_attempt_frame = -1
|
||||
self.POST_ENGAGE_LAG_GRACE_FRAMES = int(3.0 / DT_CTRL)
|
||||
```
|
||||
|
||||
In `step()` after `data_sample()`:
|
||||
```python
|
||||
self._update_park_mode(CS)
|
||||
if self.park_mode or self._in_park_exit_grace():
|
||||
self._park_mode_tick(CS)
|
||||
self.CS_prev = CS
|
||||
return
|
||||
# ... normal flow
|
||||
```
|
||||
|
||||
Helpers:
|
||||
```python
|
||||
def _park_mode_allowed(self):
|
||||
# Don't allow park mode until init has completed AND we've run at
|
||||
# least PARK_STARTUP_DELAY_FRAMES of normal step() after init.
|
||||
if not self.initialized:
|
||||
return False
|
||||
if self.startup_complete_frame < 0:
|
||||
self.startup_complete_frame = self.sm.frame
|
||||
return (self.sm.frame - self.startup_complete_frame) >= self.PARK_STARTUP_DELAY_FRAMES
|
||||
|
||||
def _update_park_mode(self, CS):
|
||||
if not self._park_mode_allowed():
|
||||
return
|
||||
in_park = CS.gearShifter == GearShifter.park
|
||||
if in_park != self.park_mode:
|
||||
self.park_mode = in_park
|
||||
self.params_memory.put_bool("ParkMode", in_park)
|
||||
if not in_park:
|
||||
self.park_exit_frame = self.sm.frame
|
||||
|
||||
def _in_park_exit_grace(self):
|
||||
if self.park_exit_frame < 0:
|
||||
return False
|
||||
if (self.sm.frame - self.park_exit_frame) >= self.PARK_GRACE_MAX_FRAMES:
|
||||
return False
|
||||
return not (self.sm.all_checks() and not self.card.can_rcv_timeout)
|
||||
|
||||
def _park_mode_tick(self, CS):
|
||||
# Build a do-nothing CarControl; card.controls_update -> CI.apply ->
|
||||
# CarController.update appends create_steering_messages()
|
||||
# unconditionally, so the LFA/LKAS heartbeat keeps flowing.
|
||||
CC = car.CarControl.new_message()
|
||||
self.clearpilot_state_control(CC, CS)
|
||||
self.card.controls_update(CC, self.frogpilot_variables)
|
||||
```
|
||||
|
||||
Why every piece matters:
|
||||
- **`_park_mode_allowed` (10 s startup delay)**: `card.initialize()` is
|
||||
only called inside `data_sample`'s `if not self.initialized` branch.
|
||||
That call wires `CarInterface` up so `controls_update` actually
|
||||
produces CAN sends. If we entered park-mode before init completed,
|
||||
the heartbeat was a silent no-op. The 10 s buffer also lets all
|
||||
`only_onroad_active` processes complete their cold-spawn the very
|
||||
first time, before manager starts pausing them.
|
||||
- **`_in_park_exit_grace`**: when ParkMode flips off, the cold-spawn
|
||||
chain takes time. Stay in keepalive-tick mode until SubMaster reports
|
||||
everything healthy, with a hard cap as a safety net (we used 15 s,
|
||||
saw it still wasn't enough — see "What broke" below).
|
||||
- **`_park_mode_tick` calls `card.controls_update`**: don't call
|
||||
`state_update` or `data_sample` — `step()` already did that. Just
|
||||
publish the empty CC to push the heartbeat CAN messages.
|
||||
|
||||
### 3. Manager gating
|
||||
|
||||
In `selfdrive/manager/process_config.py`:
|
||||
|
||||
```python
|
||||
def _park_mode() -> bool:
|
||||
return Params("/dev/shm/params").get_bool("ParkMode")
|
||||
|
||||
def only_onroad_active(started, params, CP):
|
||||
return started and not _park_mode()
|
||||
|
||||
def driverview_active(started, params, CP):
|
||||
return driverview(started, params, CP) and not _park_mode()
|
||||
|
||||
def always_run_unless_parked(started, params, CP):
|
||||
# Same as always_run, but pauses while ignition is on and parked.
|
||||
# Preserves offroad behavior.
|
||||
return not (started and _park_mode())
|
||||
```
|
||||
|
||||
Re-gated:
|
||||
- `only_onroad` → `only_onroad_active`: modeld, sensord, soundd,
|
||||
locationd, calibrationd, torqued, paramsd, plannerd, radard,
|
||||
speed_logicd
|
||||
- `driverview` → `driverview_active`: dmonitoringmodeld, dmonitoringd
|
||||
- `always_run` → `always_run_unless_parked`: frogpilot_process
|
||||
|
||||
`controlsd` stays plain `only_onroad` — it's the writer + heartbeat source.
|
||||
`camerad`, `pandad`, `thermald`, `ui`, `dashcamd`, `gpsd`, `deleter`,
|
||||
`fleet_manager`, `tombstoned`, `timed`, manager internals stay
|
||||
`always_run` (or whatever they were).
|
||||
|
||||
### 4. Engage-attempt grace (separate from park-grace)
|
||||
|
||||
The state transition into engaged briefly bumps the controlsd loop time
|
||||
over budget. `self.rk.lagging` flips True for a cycle, `update_events`
|
||||
adds `EventName.controlsdLagging`, which has an `ET.NO_ENTRY` alert
|
||||
("Controls Process Lagging: Reboot Your Device"). That alert fires
|
||||
right as the user is taking their hands off the wheel after pressing
|
||||
SET. Same goes for `commIssue` if any service is briefly stale.
|
||||
|
||||
Solution: track two engagement edges — the actual transition into
|
||||
ENABLED (post-success) and the cruise.enabled rising edge (pre-success,
|
||||
covers blocked attempts):
|
||||
|
||||
```python
|
||||
# In __init__:
|
||||
self.last_engaged_frame = -1
|
||||
self.last_engage_attempt_frame = -1
|
||||
|
||||
# In update_events, BEFORE the lag/comm checks:
|
||||
if CS.cruiseState.enabled and not self.CS_prev.cruiseState.enabled:
|
||||
self.last_engage_attempt_frame = self.sm.frame
|
||||
|
||||
# At the engaged-state transition (state_transition):
|
||||
enabled_prev = self.enabled
|
||||
self.enabled = self.state in ENABLED_STATES
|
||||
if self.enabled and not enabled_prev:
|
||||
self.last_engaged_frame = self.sm.frame
|
||||
|
||||
# Gate logic in update_events:
|
||||
in_engage_grace = (
|
||||
(self.last_engaged_frame >= 0
|
||||
and (self.sm.frame - self.last_engaged_frame) < self.POST_ENGAGE_LAG_GRACE_FRAMES)
|
||||
or
|
||||
(self.last_engage_attempt_frame >= 0
|
||||
and (self.sm.frame - self.last_engage_attempt_frame) < self.POST_ENGAGE_LAG_GRACE_FRAMES)
|
||||
)
|
||||
if not REPLAY and self.rk.lagging and not in_engage_grace:
|
||||
self.events.add(EventName.controlsdLagging)
|
||||
# ... and similarly suppress commIssue when in_engage_grace
|
||||
```
|
||||
|
||||
### 5. Fan range rules (port from broken tree)
|
||||
|
||||
Independent of park-mode plumbing. Widen the fan PID range when the car
|
||||
is parked so the device can fully cool while the user can't hear road
|
||||
noise.
|
||||
|
||||
`selfdrive/thermald/fan_controller.py`:
|
||||
```python
|
||||
def update(self, cur_temp, ignition, standstill=False, is_parked=True, cruise_engaged=False):
|
||||
# neg_limit = -max_fan_pct, pos_limit = -min_fan_pct
|
||||
if not ignition:
|
||||
self.controller.neg_limit = -30
|
||||
self.controller.pos_limit = 0
|
||||
elif is_parked:
|
||||
self.controller.neg_limit = -100
|
||||
self.controller.pos_limit = 0 # 0-100% (full range)
|
||||
elif cruise_engaged:
|
||||
self.controller.neg_limit = -100
|
||||
self.controller.pos_limit = -30 # 30-100%
|
||||
elif standstill:
|
||||
self.controller.neg_limit = -100
|
||||
self.controller.pos_limit = -10 # 10-100%
|
||||
else:
|
||||
self.controller.neg_limit = -100
|
||||
self.controller.pos_limit = -30 # 30-100%
|
||||
```
|
||||
|
||||
`selfdrive/thermald/thermald.py`:
|
||||
```python
|
||||
# Add carState to the SubMaster service list
|
||||
sm = messaging.SubMaster([..., "carState"], poll="pandaStates")
|
||||
|
||||
# At the fan_controller.update call site:
|
||||
if sm.seen['carState']:
|
||||
cs = sm['carState']
|
||||
standstill = cs.standstill
|
||||
is_parked = cs.gearShifter == car.CarState.GearShifter.park
|
||||
else:
|
||||
standstill = False
|
||||
is_parked = True # default safe: assume parked, no fan floor
|
||||
cruise_engaged = sm['controlsState'].enabled if sm.seen['controlsState'] else False
|
||||
msg.deviceState.fanSpeedPercentDesired = fan_controller.update(
|
||||
all_comp_temp, onroad_conditions["ignition"], standstill,
|
||||
is_parked=is_parked, cruise_engaged=cruise_engaged)
|
||||
```
|
||||
|
||||
This piece worked fine standalone — it's safe to land on its own without
|
||||
the rest of the park-mode plumbing. (It does require the thermald
|
||||
carState subscription to actually receive carState — which is fine in
|
||||
normal operation.)
|
||||
|
||||
## What broke
|
||||
|
||||
### A. card.initialize() not called → silent CAN
|
||||
|
||||
First attempt entered park-mode immediately whenever `gearShifter ==
|
||||
PARK`, including before controlsd had ever passed through its
|
||||
`if not self.initialized` branch. Symptoms:
|
||||
- UI stuck on splash even after shift to drive
|
||||
- carState never published to cereal (tested by subscribing externally)
|
||||
- presumably the steering ECU was getting no heartbeat, though we
|
||||
hadn't yet tried to engage
|
||||
|
||||
Fix: `_park_mode_allowed` gate, requires `self.initialized` AND a 10 s
|
||||
post-init buffer. **Critical.** Don't skip this in v2.
|
||||
|
||||
### B. Park-grace cap (8 s) too short
|
||||
|
||||
Cold-spawn chain length:
|
||||
1. modeld load thneed weights, init GPU (~3-5 s)
|
||||
2. modeld publishes modelV2
|
||||
3. plannerd consumes modelV2, publishes longitudinalPlan
|
||||
4. frogpilot_process consumes modelV2, publishes frogpilotPlan
|
||||
5. paramsd, torqued, calibrationd estimators accumulate enough samples
|
||||
to set valid=True
|
||||
|
||||
In testing this cumulatively exceeded 8 s. We bumped the cap to 15 s.
|
||||
**Even 15 s wasn't enough in some test runs** — got commIssue alerts
|
||||
with longitudinalPlan / frogpilotPlan still invalid past the cap.
|
||||
|
||||
Possible fixes for v2:
|
||||
- Even longer cap (30 s)
|
||||
- Or condition-only with no cap (live with the risk that a genuinely
|
||||
broken service strands controlsd in keepalive forever)
|
||||
- Or **a different approach**: don't actually pause modeld; throttle it
|
||||
to 1 fps when stopped (this is what the broken tree's CLAUDE.md
|
||||
pending-features list described as "modeld throttled to 1fps when
|
||||
stopped"). Avoids the cold-spawn entirely.
|
||||
|
||||
### C. NO_ENTRY alerts fire on engage attempt, not engage success
|
||||
|
||||
`controlsdLagging` and `commIssue` both have `ET.NO_ENTRY`. NO_ENTRY
|
||||
alerts fire when the user *tries* to engage but the event blocks them.
|
||||
At that moment `self.enabled` hasn't flipped yet, so a grace keyed on
|
||||
`last_engaged_frame` is inert.
|
||||
|
||||
Fix: also track `CS.cruiseState.enabled` rising edge as
|
||||
`last_engage_attempt_frame`. See "Engage-attempt grace" above.
|
||||
|
||||
### D. Cascade of related alerts
|
||||
|
||||
After all the above were patched, in-car testing still produced
|
||||
"locationd temporaryError" and "sensorDataInvalid" alerts after the
|
||||
post-engage grace window expired. We didn't have time to chase these
|
||||
down before reverting. Hypotheses:
|
||||
- locationd's kalman filter needs more than ~15 s to converge after
|
||||
cold spawn, especially without GPS feeding it (we explicitly skip
|
||||
GPS in locationd).
|
||||
- sensord might lose its IMU sample lock and need re-init when
|
||||
killed/restarted.
|
||||
|
||||
These are an argument for the "throttle, don't kill" approach in v2.
|
||||
|
||||
### E. Sensor data invalid
|
||||
|
||||
`sensorDataInvalid` likely stems from sensord being killed and
|
||||
restarted. The IMU init handshake takes time, and during that window
|
||||
`accelerometer` / `gyroscope` services are alive but their data hasn't
|
||||
stabilized — locationd reports invalid, controlsd alerts.
|
||||
|
||||
Same fix family as (D).
|
||||
|
||||
## Recommended approach for v2
|
||||
|
||||
The "kill processes" approach is architecturally clean but creates a
|
||||
big cold-spawn cliff every time the user shifts to drive. The v2 plan
|
||||
should probably look more like:
|
||||
|
||||
1. **Throttle, don't kill.** Send a "standby" memory param that modeld
|
||||
reads and reduces its tick rate to 1 fps. dmonitoringmodeld likewise.
|
||||
Plannerd/radard naturally adapt to slower modelV2 input. paramsd /
|
||||
torqued / calibrationd / locationd keep accumulating slowly.
|
||||
2. **Keep controlsd's main loop running normally** — no park_mode_tick.
|
||||
The state machine already handles "not engaged" → no actuators →
|
||||
passive. The heartbeat flows naturally because `state_control` /
|
||||
`publish_logs` run every cycle.
|
||||
3. **Apply the fan-range widening (Section 5 of "Design")** — that
|
||||
piece is independently valuable and doesn't depend on the rest.
|
||||
4. **Skip dashcamd touchpoints.** dashcamd is already gear-aware via
|
||||
its own state machine and pauses recording in park naturally.
|
||||
5. **Keep the `_park_mode_allowed` startup-delay gate concept** if any
|
||||
form of conditional-shutdown is reintroduced — this guards against
|
||||
`card.initialize()` being skipped.
|
||||
|
||||
### Files involved (for reference)
|
||||
|
||||
| File | Why |
|
||||
|---|---|
|
||||
| `common/params.cc` | register `ParkMode` (or `ModelStandby` etc.) |
|
||||
| `selfdrive/manager/manager.py` | `manager_init` memory-param defaults |
|
||||
| `selfdrive/manager/process_config.py` | gating helpers |
|
||||
| `selfdrive/controls/controlsd.py` | the park branch and engage-grace |
|
||||
| `selfdrive/thermald/fan_controller.py` | fan range rules |
|
||||
| `selfdrive/thermald/thermald.py` | carState sub for fan rules |
|
||||
| `selfdrive/modeld/modeld.py` (v2) | throttle on standby flag |
|
||||
| `selfdrive/modeld/dmonitoringmodeld.py` (v2) | throttle on standby flag |
|
||||
|
||||
### Reference commits (force-pushed away from origin/clearpilot, but the
|
||||
text of this document captures the substance)
|
||||
|
||||
- park-mode initial: bring up flag + gating + controlsd keepalive
|
||||
- park-mode startup gate: 10 s post-init delay before allowing park
|
||||
- engage-grace: 3 s suppression on engage edge for controlsdLagging
|
||||
- engage-attempt grace + grace cap bump to 15 s
|
||||
|
||||
These were all on top of `stable_5_4_26` (`0d1cedd`).
|
||||
Reference in New Issue
Block a user