diff --git a/sessions/2026-04-26-0914-baseline-revert-and-parked-mode/README.md b/sessions/2026-04-26-0914-baseline-revert-and-parked-mode/README.md new file mode 100644 index 0000000..c2e677a --- /dev/null +++ b/sessions/2026-04-26-0914-baseline-revert-and-parked-mode/README.md @@ -0,0 +1,355 @@ +# Session: 2026-04-26 — Baseline Revert + Parked-Controlsd Mode + +## Context + +This session was driven by a regression: the steering wheel "feels like +it pulls right" during normal driving, with no clear smoking gun. The +suspicion was that one of the variable-FPS / standstill-throttling +changes (added to reduce parked-state fan noise and CPU load) bled into +on-road driving behavior in a hard-to-isolate way. + +Strategy: revert all driving-relevant logic to a known-good baseline +captured in `/projects/openpilot/archive/clearpilot` (HEAD `980f0aa`, +July 2024), keep all of the ClearPilot UI/dashcam/telemetry/bench-mode +infrastructure intact on top, then attack the parked-fan-noise problem +fresh from a different angle that doesn't touch driving logic at all. + +Three commits landed in this order on branch `clearpilot`: + +| SHA | Title | +|---|---| +| `47321e3` | restore driving logic to pre-variable-fps baseline | +| `f7e602c` | controlsd: re-wire UI hooks on top of restored baseline | +| `887b9c9` | parked-controlsd mode: shut down heavy stack while ignition+park | + +Pre-revert tip was `62a403d`. The on-device agent should treat +**`62a403d` as "the broken version"** when looking at history. + +--- + +## Commit 1: `47321e3` — Baseline restore + +Reverted the following files wholesale to their `980f0aa` archive copy: + +- `selfdrive/controls/controlsd.py` +- `selfdrive/controls/lib/events.py` +- `selfdrive/controls/lib/longitudinal_planner.py` +- `selfdrive/modeld/modeld.py` +- `selfdrive/modeld/dmonitoringmodeld.py` +- `selfdrive/locationd/calibrationd.py` +- `selfdrive/locationd/paramsd.py` +- `selfdrive/locationd/torqued.py` +- `selfdrive/car/interfaces.py` +- `selfdrive/car/hyundai/carstate.py` (CAN-FD telemetry preserved as a + commented block at the bottom of `update_canfd` — re-enable by + uncommenting; the `tlog` import is also commented out). +- `selfdrive/monitoring/dmonitoringd.py` +- `selfdrive/frogpilot/controls/frogpilot_planner.py` +- `common/realtime.py` + +### Intentionally NOT restored (kept as the post-`62a403d` version) + +- `selfdrive/thermald/*` — fan/power tuning kept as-is. +- `selfdrive/car/hyundai/carcontroller.py` and `hyundaicanfd.py` — + reviewed; the only delta vs baseline is hoisting the + `no_lat_lane_change` Params read out of the 100Hz hot path + (~5% carcontroller CPU) by passing the bit as an argument. This is + a perf-only change with no behavioral effect outside lane changes. +- `cereal/services.py`, `cereal/custom.capnp` — additive only. + `custom.capnp` adds `latRequested @3` and `noLatLaneChange @4` fields + to `FrogPilotCarControl`; capnp tag numbers are append-only so this is + safe to leave in even though baseline code doesn't write to them. +- `selfdrive/manager/*`, `common/params.cc` — heavy ClearPilot + infrastructure (bench mode, log dir, dashcamd, gpsd, ClearPilot params). +- All `selfdrive/ui/`, `selfdrive/clearpilot/`, `system/clearpilot/`. + +### Things removed by the restore (no longer in the tree) + +- Standstill frame skipping in modeld (was: skip GPU inference 19/20 + frames at standstill, report 0 dropped frames to fool controlsd). +- Standstill frame skipping in dmonitoringmodeld. +- Model standby logic + `ModelStandby`/`ModelStandbyTs` reads in + controlsd's comm-issue suppression path. +- Parked-cycle skip in `state_control()` (10Hz vs 100Hz when in Park). +- Calibrationd validity decoupling from `sm.all_checks()`. +- Post-engage 2s commIssue/location/params suppression window. +- Per-cycle carstate write-gating for `CarSpeedLimit`/`CarIsMetric` in + carstate.py. +- The diff-based carstate telemetry calls (preserved commented out). + +These were the candidates for the steering-pull regression. The on-device +agent should **not re-introduce any of these** without a deliberate plan +and a test session. The user's intent is to get baseline driving feel +confirmed first, then re-introduce optimizations one at a time and drive +each. + +--- + +## Commit 2: `f7e602c` — UI hooks on top of baseline + +Baseline `controlsd.py` doesn't have the UI plumbing the ClearPilot UI +expects. This commit re-adds only the things needed for the existing UI +to keep working — pure params-write plumbing, no actuator effect. + +### Changes (all in `selfdrive/controls/controlsd.py`) + +1. **Import `SpeedState`** from `openpilot.selfdrive.clearpilot.speed_logic`. +2. **`Controls.__init__`**: + - `params_memory.put_bool("ScreenDisplayMode", 0)` → + `params_memory.put_int("ScreenDisplayMode", 0)` (UI reads it as int). + - Added `self.speed_state = SpeedState()`, `self.speed_state_frame = 0`, + `self.was_driving_gear = False`. +3. **SubMaster** — added `gpsLocation` to the subscriber list, with + `ignore_alive` / `ignore_avg_freq` / `ignore_valid` so missing GPS + doesn't trigger commIssue. +4. **`clearpilot_state_control(...)`** rewritten from a simple 3-state + cycle into the documented 5-state ScreenDisplayMode machine: + - Auto-wake on park→drive edge if currently in screen-off (state 3). + - LFA button transitions in drive: `0→4, 1→2, 2→3, 3→4, 4→2`. + - LFA button transitions outside drive: any except 3 → 3, state 3 → 0. + - Speed/cruise-warning overlay tick at ~2Hz (`speed_state.update(...)`) + reading `gpsLocation`, `CarSpeedLimit` param, `self.is_metric`, + `CS.cruiseState`. This is what writes + `ClearpilotSpeedDisplay`/`ClearpilotSpeedLimitDisplay`/ + `ClearpilotCruiseWarning` for the UI overlay. +5. **Lane-change suppression sync** — at the existing baseline + `clearpilot_disable_lat_on_lane_change` block (around line 687), in + addition to the existing `params_memory.put_bool("no_lat_lane_change", ...)`, + also set `self.frogpilot_variables.no_lat_lane_change = True/False`. + This is required because the kept (post-`62a403d`) carcontroller + reads off `frogpilot_variables`, not Params. + +### What does NOT change + +No edits to lateral or longitudinal control paths. No new actuator-side +behavior. The UI features (nightrider/screen-off/auto day-night, speed +overlay, cruise warning chime) get wired back up purely through param +writes. + +--- + +## Commit 3: `887b9c9` — Parked-controlsd mode + +Architectural fix for the original problem this whole session was +chasing: while ignition is on but the car is in Park, the entire onroad +stack (modeld, planner, control, locationd, calibrationd, paramsd, +torqued, dmonitoring*, soundd, loggerd) is running and burning CPU/fan +even though none of it is needed. + +Solution: redefine "onroad" as **ignition AND not parked** instead of +just **ignition**. Reuse the existing `started`-based process gating in +manager. Add a tiny second controlsd variant that runs while parked, +just to keep CAN flowing so thermald can see when gear leaves Park. + +### Files + +#### NEW: `selfdrive/controls/controlsd_parked.py` + +Minimal entry point. Roughly: + +```python +def main(): + config_realtime_process(4, Priority.CTRL_HIGH) + card = CarD() # blocks until first CAN, fingerprints car + card.initialize() + fv = _make_default_frogpilot_variables() # safe False/0 SimpleNamespace + while True: + card.state_update(fv) # publishes carState/carOutput/carParams +``` + +`CarD.state_update` blocks via `drain_sock_raw(wait_for_one=True)`, so +the loop is paced by CAN traffic — no extra sleep, no CPU spin. + +The default `frogpilot_variables` sets these to safe values so +`CarInterfaceBase.update` doesn't `AttributeError`: +`conditional_experimental_mode`, `experimental_mode_via_distance`, +`traffic_mode`, `sport_plus`, `long_pitch`, `no_lat_lane_change` — all +False. + +#### `selfdrive/thermald/thermald.py` + +- `onroad_conditions` now also has `"not_parked"`. Initialized to + `False` (assume parked at boot) so the heavy stack waits for carState + to confirm gear has left Park before spinning up. +- New module-level loop variables: `is_parked = True`, + `parked_since: float | None = None`, `PARKED_HYSTERESIS_S = 1.5`, + `ignition_param_prev: bool | None = None`. +- New block right after the panda-disconnect check (around line 258 in + current state) reads `sm['carState'].gearShifter`: + - Gear == Park: latch `parked_since`, flip `is_parked = True` after + 1.5s of continuous Park (hysteresis). + - Gear != Park: clear `parked_since`, `is_parked = False` immediately + (no hysteresis going out). + - Reverse is treated as **not parked** — driver is moving. +- `onroad_conditions["not_parked"] = not is_parked` every tick. +- New `IgnitionOn` Params write, edge-driven (only on change of + `onroad_conditions["ignition"]`) so we don't hammer the persistent + filesystem 2x/sec. + +#### `selfdrive/manager/process_config.py` + +- New predicate: + ```python + def parked_only(started, params, CP): + return params.get_bool("IgnitionOn") and not started + ``` +- New process entry directly after the existing `controlsd` entry: + ```python + PythonProcess("controlsd_parked", "selfdrive.controls.controlsd_parked", parked_only), + ``` +- `controlsd` is unchanged (still `only_onroad`). Mutually exclusive + with `parked_only` because `started` is the negation of the relevant + condition. + +#### `selfdrive/manager/manager.py` + +- Single line in `manager_init()` seeding `IgnitionOn=False` so the + predicate evaluates correctly before thermald's first tick. + +#### `common/params.cc` + +- New entry in the alphabetical I-block: + ```cpp + {"IgnitionOn", CLEAR_ON_MANAGER_START}, + ``` +- Manager predicates can only see persistent Params (not pandaStates + or `/dev/shm/params`), which is why thermald has to expose ignition + this way. + +### State machine summary + +| Ignition | Gear | Predicates true | What runs | +|----------|-------------|-----------------------------|-------------------------------| +| off | any | none | always_run only (ui, thermald, pandad, deleter, …) | +| on | Park (>1.5s)| parked_only | always_run + controlsd_parked | +| on | not Park | only_onroad (=> all `started`) | full onroad stack | + +The transition between rows 2 and 3 is purely manager process +swapping driven by predicate flips — no IPC handshake, no +self-termination. Thermald sees gear change, flips `not_parked`, +`should_start = all(onroad_conditions.values())` flips, manager kills +the wrong variant and spawns the right one on its next tick. + +--- + +## What the on-device agent needs to know / debug + +### Build prerequisites + +A new param key was added (`IgnitionOn`), so the C++ params whitelist +needs a fresh build. Per `CLAUDE.md` "Adding New Params": + +```bash +chown -R comma:comma /data/openpilot +rm -f /data/openpilot/prebuilt /data/openpilot/common/params.o /data/openpilot/common/libcommon.a +su - comma -c "bash /data/openpilot/build_only.sh" +``` + +`build_only.sh` already deletes `prebuilt` but **does not** delete +`params.o` / `libcommon.a` — verify those are gone before building or +the new key won't be picked up and `Params().put_bool("IgnitionOn", …)` +will throw `UnknownKeyName` in manager_init or thermald. + +### How to verify the swap is working + +```bash +# 1. Watch which controlsd variant is running +watch -n 1 'ps -ef | grep -E "controlsd(_parked)?" | grep -v grep' + +# 2. Watch the gating signals +watch -n 1 'echo "IgnitionOn:"; cat /data/params/d/IgnitionOn 2>/dev/null; \ + echo; echo "deviceState.started:" ; \ + python3 -c "import cereal.messaging as m; \ + s=m.sub_sock(\"deviceState\", timeout=1000); \ + print(m.recv_one(s).deviceState.started)"' + +# 3. Watch gear via carState +python3 -c " +import cereal.messaging as m +s = m.sub_sock('carState', timeout=2000) +while True: + msg = m.recv_one(s) + if msg: print(msg.carState.gearShifter) +" +``` + +Expected behavior: +- Ignition off: neither variant in `ps`. `IgnitionOn` is `0`/missing. +- Ignition on, in Park: `controlsd_parked` in `ps`, `controlsd` is not. + `started` is False. After ~1.5s of confirmed Park, modeld and friends + should have stopped. +- Shift to Drive: `controlsd_parked` disappears within ~500ms, full + `controlsd` appears, all the onroad processes spin up. There will be + a brief carState gap during the swap (~0.5–2s). + +### What to suspect first if something breaks + +1. **Manager crash on startup** with `UnknownKeyName: IgnitionOn`. Means + `params.o`/`libcommon.a` weren't rebuilt. Delete them and rebuild. +2. **`controlsd_parked` keeps respawning / dying.** Check + `/data/log2/current/controlsd_parked.log`. Most likely either: + - `CarD.__init__` is hanging on `get_one_can` because pandad isn't + up yet — should only matter on the very first boot. + - A `frogpilot_variables` attribute we missed defaulting; add it to + `_make_default_frogpilot_variables` in `controlsd_parked.py`. +3. **Full controlsd never spawns after shift to Drive.** Check + `IgnitionOn` (should be `1`), check carState.gearShifter (should not + be `park`/`unknown`), check thermald.log for `should_start` logic. + Also check that `carState` is actually being published by + `controlsd_parked`. +4. **Steering still pulls right.** The whole point of commit 1 is to + rule out the variable-FPS work as the cause. If the symptom persists + on baseline-restored driving logic, the suspect list shifts to: + - Anything still on the kept side (carcontroller's + `no_lat_lane_change` plumbing, thermald-related changes, + custom.capnp additions). + - Hardware: a calibration that drifted, an alignment issue, panda + CAN bus issue, or torque tuning that was modified outside of these + files. + - The custom driving model selected (`Params("Model")`). Confirm + which `.thneed`/`.onnx` is loaded — none of the model files + themselves were changed in this session. +5. **Panda safety alerts during park transition.** If panda logs a + "lost heartbeat" or drops to NOOUTPUT mode in the swap window, we + need controlsd_parked to issue a no-op carcontrol heartbeat to keep + panda happy. Not implemented in this session — flag for follow-up. + +### Open follow-up items (NOT done in this session) + +- **Cold-start latency on shift-from-Park.** Modeld load + calibration + warmup may produce a noticeable gap before lateral/long are ready. + Anticipatory wake on `CS.brakePressed && in_park` is the planned + mitigation if needed. +- **Methodically reintroduce optimizations.** Once baseline driving + feels right, the standstill optimizations (modeld 1fps, fan clamps, + etc.) can come back one at a time, each with a drive test. +- **The CAN-FD telemetry block in `selfdrive/car/hyundai/carstate.py`** + is preserved as a commented block at the bottom of `update_canfd()`. + Re-enabling requires uncommenting + restoring the `tlog` import at + the top of the file. + +--- + +## Key file index for fast navigation + +``` +selfdrive/controls/controlsd.py # full controlsd, restored to baseline + UI hooks re-added +selfdrive/controls/controlsd_parked.py # NEW: parked-only CAN listener +selfdrive/controls/clearpilot_state_control # in controlsd.py, ~line 1255 — 5-state ScreenDisplayMode + speed_state tick +selfdrive/thermald/thermald.py # gear-aware not_parked + IgnitionOn writer (~line 258) +selfdrive/manager/process_config.py # parked_only predicate + new entry +selfdrive/manager/manager.py # IgnitionOn seed in manager_init +common/params.cc # IgnitionOn registered (CLEAR_ON_MANAGER_START) +selfdrive/car/card.py # CarD class — used by both controlsd variants, unchanged +selfdrive/clearpilot/speed_logic.py # SpeedState class — unchanged, called from controlsd.clearpilot_state_control +selfdrive/car/hyundai/carstate.py # restored baseline + commented telemetry block at end of update_canfd +``` + +## Reproducing the diff per commit + +```bash +git show 47321e3 # baseline restore +git show f7e602c # UI hooks +git show 887b9c9 # parked mode +git diff 62a403d..887b9c9 # full session delta +```