Files
clearpilot/sessions/2026-04-26-0914-baseline-revert-and-parked-mode/README.md
T
brianhansonxyz 27cad05cd9 sessions: document baseline revert + parked-controlsd architecture
Briefing for the on-device agent covering commits 47321e3, f7e602c,
and 887b9c9: what was reverted vs kept, the UI re-wire, the parked
controlsd swap mechanism, build prerequisites, runtime verification
commands, and a troubleshooting playbook.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 09:18:34 -05:00

356 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Session: 2026-04-26 — Baseline Revert + Parked-Controlsd Mode
## Context
This session was driven by a regression: the steering wheel "feels like
it pulls right" during normal driving, with no clear smoking gun. The
suspicion was that one of the variable-FPS / standstill-throttling
changes (added to reduce parked-state fan noise and CPU load) bled into
on-road driving behavior in a hard-to-isolate way.
Strategy: revert all driving-relevant logic to a known-good baseline
captured in `/projects/openpilot/archive/clearpilot` (HEAD `980f0aa`,
July 2024), keep all of the ClearPilot UI/dashcam/telemetry/bench-mode
infrastructure intact on top, then attack the parked-fan-noise problem
fresh from a different angle that doesn't touch driving logic at all.
Three commits landed in this order on branch `clearpilot`:
| SHA | Title |
|---|---|
| `47321e3` | restore driving logic to pre-variable-fps baseline |
| `f7e602c` | controlsd: re-wire UI hooks on top of restored baseline |
| `887b9c9` | parked-controlsd mode: shut down heavy stack while ignition+park |
Pre-revert tip was `62a403d`. The on-device agent should treat
**`62a403d` as "the broken version"** when looking at history.
---
## Commit 1: `47321e3` — Baseline restore
Reverted the following files wholesale to their `980f0aa` archive copy:
- `selfdrive/controls/controlsd.py`
- `selfdrive/controls/lib/events.py`
- `selfdrive/controls/lib/longitudinal_planner.py`
- `selfdrive/modeld/modeld.py`
- `selfdrive/modeld/dmonitoringmodeld.py`
- `selfdrive/locationd/calibrationd.py`
- `selfdrive/locationd/paramsd.py`
- `selfdrive/locationd/torqued.py`
- `selfdrive/car/interfaces.py`
- `selfdrive/car/hyundai/carstate.py` (CAN-FD telemetry preserved as a
commented block at the bottom of `update_canfd` — re-enable by
uncommenting; the `tlog` import is also commented out).
- `selfdrive/monitoring/dmonitoringd.py`
- `selfdrive/frogpilot/controls/frogpilot_planner.py`
- `common/realtime.py`
### Intentionally NOT restored (kept as the post-`62a403d` version)
- `selfdrive/thermald/*` — fan/power tuning kept as-is.
- `selfdrive/car/hyundai/carcontroller.py` and `hyundaicanfd.py`
reviewed; the only delta vs baseline is hoisting the
`no_lat_lane_change` Params read out of the 100Hz hot path
(~5% carcontroller CPU) by passing the bit as an argument. This is
a perf-only change with no behavioral effect outside lane changes.
- `cereal/services.py`, `cereal/custom.capnp` — additive only.
`custom.capnp` adds `latRequested @3` and `noLatLaneChange @4` fields
to `FrogPilotCarControl`; capnp tag numbers are append-only so this is
safe to leave in even though baseline code doesn't write to them.
- `selfdrive/manager/*`, `common/params.cc` — heavy ClearPilot
infrastructure (bench mode, log dir, dashcamd, gpsd, ClearPilot params).
- All `selfdrive/ui/`, `selfdrive/clearpilot/`, `system/clearpilot/`.
### Things removed by the restore (no longer in the tree)
- Standstill frame skipping in modeld (was: skip GPU inference 19/20
frames at standstill, report 0 dropped frames to fool controlsd).
- Standstill frame skipping in dmonitoringmodeld.
- Model standby logic + `ModelStandby`/`ModelStandbyTs` reads in
controlsd's comm-issue suppression path.
- Parked-cycle skip in `state_control()` (10Hz vs 100Hz when in Park).
- Calibrationd validity decoupling from `sm.all_checks()`.
- Post-engage 2s commIssue/location/params suppression window.
- Per-cycle carstate write-gating for `CarSpeedLimit`/`CarIsMetric` in
carstate.py.
- The diff-based carstate telemetry calls (preserved commented out).
These were the candidates for the steering-pull regression. The on-device
agent should **not re-introduce any of these** without a deliberate plan
and a test session. The user's intent is to get baseline driving feel
confirmed first, then re-introduce optimizations one at a time and drive
each.
---
## Commit 2: `f7e602c` — UI hooks on top of baseline
Baseline `controlsd.py` doesn't have the UI plumbing the ClearPilot UI
expects. This commit re-adds only the things needed for the existing UI
to keep working — pure params-write plumbing, no actuator effect.
### Changes (all in `selfdrive/controls/controlsd.py`)
1. **Import `SpeedState`** from `openpilot.selfdrive.clearpilot.speed_logic`.
2. **`Controls.__init__`**:
- `params_memory.put_bool("ScreenDisplayMode", 0)`
`params_memory.put_int("ScreenDisplayMode", 0)` (UI reads it as int).
- Added `self.speed_state = SpeedState()`, `self.speed_state_frame = 0`,
`self.was_driving_gear = False`.
3. **SubMaster** — added `gpsLocation` to the subscriber list, with
`ignore_alive` / `ignore_avg_freq` / `ignore_valid` so missing GPS
doesn't trigger commIssue.
4. **`clearpilot_state_control(...)`** rewritten from a simple 3-state
cycle into the documented 5-state ScreenDisplayMode machine:
- Auto-wake on park→drive edge if currently in screen-off (state 3).
- LFA button transitions in drive: `0→4, 1→2, 2→3, 3→4, 4→2`.
- LFA button transitions outside drive: any except 3 → 3, state 3 → 0.
- Speed/cruise-warning overlay tick at ~2Hz (`speed_state.update(...)`)
reading `gpsLocation`, `CarSpeedLimit` param, `self.is_metric`,
`CS.cruiseState`. This is what writes
`ClearpilotSpeedDisplay`/`ClearpilotSpeedLimitDisplay`/
`ClearpilotCruiseWarning` for the UI overlay.
5. **Lane-change suppression sync** — at the existing baseline
`clearpilot_disable_lat_on_lane_change` block (around line 687), in
addition to the existing `params_memory.put_bool("no_lat_lane_change", ...)`,
also set `self.frogpilot_variables.no_lat_lane_change = True/False`.
This is required because the kept (post-`62a403d`) carcontroller
reads off `frogpilot_variables`, not Params.
### What does NOT change
No edits to lateral or longitudinal control paths. No new actuator-side
behavior. The UI features (nightrider/screen-off/auto day-night, speed
overlay, cruise warning chime) get wired back up purely through param
writes.
---
## Commit 3: `887b9c9` — Parked-controlsd mode
Architectural fix for the original problem this whole session was
chasing: while ignition is on but the car is in Park, the entire onroad
stack (modeld, planner, control, locationd, calibrationd, paramsd,
torqued, dmonitoring*, soundd, loggerd) is running and burning CPU/fan
even though none of it is needed.
Solution: redefine "onroad" as **ignition AND not parked** instead of
just **ignition**. Reuse the existing `started`-based process gating in
manager. Add a tiny second controlsd variant that runs while parked,
just to keep CAN flowing so thermald can see when gear leaves Park.
### Files
#### NEW: `selfdrive/controls/controlsd_parked.py`
Minimal entry point. Roughly:
```python
def main():
config_realtime_process(4, Priority.CTRL_HIGH)
card = CarD() # blocks until first CAN, fingerprints car
card.initialize()
fv = _make_default_frogpilot_variables() # safe False/0 SimpleNamespace
while True:
card.state_update(fv) # publishes carState/carOutput/carParams
```
`CarD.state_update` blocks via `drain_sock_raw(wait_for_one=True)`, so
the loop is paced by CAN traffic — no extra sleep, no CPU spin.
The default `frogpilot_variables` sets these to safe values so
`CarInterfaceBase.update` doesn't `AttributeError`:
`conditional_experimental_mode`, `experimental_mode_via_distance`,
`traffic_mode`, `sport_plus`, `long_pitch`, `no_lat_lane_change` — all
False.
#### `selfdrive/thermald/thermald.py`
- `onroad_conditions` now also has `"not_parked"`. Initialized to
`False` (assume parked at boot) so the heavy stack waits for carState
to confirm gear has left Park before spinning up.
- New module-level loop variables: `is_parked = True`,
`parked_since: float | None = None`, `PARKED_HYSTERESIS_S = 1.5`,
`ignition_param_prev: bool | None = None`.
- New block right after the panda-disconnect check (around line 258 in
current state) reads `sm['carState'].gearShifter`:
- Gear == Park: latch `parked_since`, flip `is_parked = True` after
1.5s of continuous Park (hysteresis).
- Gear != Park: clear `parked_since`, `is_parked = False` immediately
(no hysteresis going out).
- Reverse is treated as **not parked** — driver is moving.
- `onroad_conditions["not_parked"] = not is_parked` every tick.
- New `IgnitionOn` Params write, edge-driven (only on change of
`onroad_conditions["ignition"]`) so we don't hammer the persistent
filesystem 2x/sec.
#### `selfdrive/manager/process_config.py`
- New predicate:
```python
def parked_only(started, params, CP):
return params.get_bool("IgnitionOn") and not started
```
- New process entry directly after the existing `controlsd` entry:
```python
PythonProcess("controlsd_parked", "selfdrive.controls.controlsd_parked", parked_only),
```
- `controlsd` is unchanged (still `only_onroad`). Mutually exclusive
with `parked_only` because `started` is the negation of the relevant
condition.
#### `selfdrive/manager/manager.py`
- Single line in `manager_init()` seeding `IgnitionOn=False` so the
predicate evaluates correctly before thermald's first tick.
#### `common/params.cc`
- New entry in the alphabetical I-block:
```cpp
{"IgnitionOn", CLEAR_ON_MANAGER_START},
```
- Manager predicates can only see persistent Params (not pandaStates
or `/dev/shm/params`), which is why thermald has to expose ignition
this way.
### State machine summary
| Ignition | Gear | Predicates true | What runs |
|----------|-------------|-----------------------------|-------------------------------|
| off | any | none | always_run only (ui, thermald, pandad, deleter, …) |
| on | Park (>1.5s)| parked_only | always_run + controlsd_parked |
| on | not Park | only_onroad (=> all `started`) | full onroad stack |
The transition between rows 2 and 3 is purely manager process
swapping driven by predicate flips — no IPC handshake, no
self-termination. Thermald sees gear change, flips `not_parked`,
`should_start = all(onroad_conditions.values())` flips, manager kills
the wrong variant and spawns the right one on its next tick.
---
## What the on-device agent needs to know / debug
### Build prerequisites
A new param key was added (`IgnitionOn`), so the C++ params whitelist
needs a fresh build. Per `CLAUDE.md` "Adding New Params":
```bash
chown -R comma:comma /data/openpilot
rm -f /data/openpilot/prebuilt /data/openpilot/common/params.o /data/openpilot/common/libcommon.a
su - comma -c "bash /data/openpilot/build_only.sh"
```
`build_only.sh` already deletes `prebuilt` but **does not** delete
`params.o` / `libcommon.a` — verify those are gone before building or
the new key won't be picked up and `Params().put_bool("IgnitionOn", …)`
will throw `UnknownKeyName` in manager_init or thermald.
### How to verify the swap is working
```bash
# 1. Watch which controlsd variant is running
watch -n 1 'ps -ef | grep -E "controlsd(_parked)?" | grep -v grep'
# 2. Watch the gating signals
watch -n 1 'echo "IgnitionOn:"; cat /data/params/d/IgnitionOn 2>/dev/null; \
echo; echo "deviceState.started:" ; \
python3 -c "import cereal.messaging as m; \
s=m.sub_sock(\"deviceState\", timeout=1000); \
print(m.recv_one(s).deviceState.started)"'
# 3. Watch gear via carState
python3 -c "
import cereal.messaging as m
s = m.sub_sock('carState', timeout=2000)
while True:
msg = m.recv_one(s)
if msg: print(msg.carState.gearShifter)
"
```
Expected behavior:
- Ignition off: neither variant in `ps`. `IgnitionOn` is `0`/missing.
- Ignition on, in Park: `controlsd_parked` in `ps`, `controlsd` is not.
`started` is False. After ~1.5s of confirmed Park, modeld and friends
should have stopped.
- Shift to Drive: `controlsd_parked` disappears within ~500ms, full
`controlsd` appears, all the onroad processes spin up. There will be
a brief carState gap during the swap (~0.52s).
### What to suspect first if something breaks
1. **Manager crash on startup** with `UnknownKeyName: IgnitionOn`. Means
`params.o`/`libcommon.a` weren't rebuilt. Delete them and rebuild.
2. **`controlsd_parked` keeps respawning / dying.** Check
`/data/log2/current/controlsd_parked.log`. Most likely either:
- `CarD.__init__` is hanging on `get_one_can` because pandad isn't
up yet — should only matter on the very first boot.
- A `frogpilot_variables` attribute we missed defaulting; add it to
`_make_default_frogpilot_variables` in `controlsd_parked.py`.
3. **Full controlsd never spawns after shift to Drive.** Check
`IgnitionOn` (should be `1`), check carState.gearShifter (should not
be `park`/`unknown`), check thermald.log for `should_start` logic.
Also check that `carState` is actually being published by
`controlsd_parked`.
4. **Steering still pulls right.** The whole point of commit 1 is to
rule out the variable-FPS work as the cause. If the symptom persists
on baseline-restored driving logic, the suspect list shifts to:
- Anything still on the kept side (carcontroller's
`no_lat_lane_change` plumbing, thermald-related changes,
custom.capnp additions).
- Hardware: a calibration that drifted, an alignment issue, panda
CAN bus issue, or torque tuning that was modified outside of these
files.
- The custom driving model selected (`Params("Model")`). Confirm
which `.thneed`/`.onnx` is loaded — none of the model files
themselves were changed in this session.
5. **Panda safety alerts during park transition.** If panda logs a
"lost heartbeat" or drops to NOOUTPUT mode in the swap window, we
need controlsd_parked to issue a no-op carcontrol heartbeat to keep
panda happy. Not implemented in this session — flag for follow-up.
### Open follow-up items (NOT done in this session)
- **Cold-start latency on shift-from-Park.** Modeld load + calibration
warmup may produce a noticeable gap before lateral/long are ready.
Anticipatory wake on `CS.brakePressed && in_park` is the planned
mitigation if needed.
- **Methodically reintroduce optimizations.** Once baseline driving
feels right, the standstill optimizations (modeld 1fps, fan clamps,
etc.) can come back one at a time, each with a drive test.
- **The CAN-FD telemetry block in `selfdrive/car/hyundai/carstate.py`**
is preserved as a commented block at the bottom of `update_canfd()`.
Re-enabling requires uncommenting + restoring the `tlog` import at
the top of the file.
---
## Key file index for fast navigation
```
selfdrive/controls/controlsd.py # full controlsd, restored to baseline + UI hooks re-added
selfdrive/controls/controlsd_parked.py # NEW: parked-only CAN listener
selfdrive/controls/clearpilot_state_control # in controlsd.py, ~line 1255 — 5-state ScreenDisplayMode + speed_state tick
selfdrive/thermald/thermald.py # gear-aware not_parked + IgnitionOn writer (~line 258)
selfdrive/manager/process_config.py # parked_only predicate + new entry
selfdrive/manager/manager.py # IgnitionOn seed in manager_init
common/params.cc # IgnitionOn registered (CLEAR_ON_MANAGER_START)
selfdrive/car/card.py # CarD class — used by both controlsd variants, unchanged
selfdrive/clearpilot/speed_logic.py # SpeedState class — unchanged, called from controlsd.clearpilot_state_control
selfdrive/car/hyundai/carstate.py # restored baseline + commented telemetry block at end of update_canfd
```
## Reproducing the diff per commit
```bash
git show 47321e3 # baseline restore
git show f7e602c # UI hooks
git show 887b9c9 # parked mode
git diff 62a403d..887b9c9 # full session delta
```