sessions: document baseline revert + parked-controlsd architecture
Briefing for the on-device agent covering commits47321e3,f7e602c, and887b9c9: what was reverted vs kept, the UI re-wire, the parked controlsd swap mechanism, build prerequisites, runtime verification commands, and a troubleshooting playbook. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,355 @@
|
|||||||
|
# Session: 2026-04-26 — Baseline Revert + Parked-Controlsd Mode
|
||||||
|
|
||||||
|
## Context
|
||||||
|
|
||||||
|
This session was driven by a regression: the steering wheel "feels like
|
||||||
|
it pulls right" during normal driving, with no clear smoking gun. The
|
||||||
|
suspicion was that one of the variable-FPS / standstill-throttling
|
||||||
|
changes (added to reduce parked-state fan noise and CPU load) bled into
|
||||||
|
on-road driving behavior in a hard-to-isolate way.
|
||||||
|
|
||||||
|
Strategy: revert all driving-relevant logic to a known-good baseline
|
||||||
|
captured in `/projects/openpilot/archive/clearpilot` (HEAD `980f0aa`,
|
||||||
|
July 2024), keep all of the ClearPilot UI/dashcam/telemetry/bench-mode
|
||||||
|
infrastructure intact on top, then attack the parked-fan-noise problem
|
||||||
|
fresh from a different angle that doesn't touch driving logic at all.
|
||||||
|
|
||||||
|
Three commits landed in this order on branch `clearpilot`:
|
||||||
|
|
||||||
|
| SHA | Title |
|
||||||
|
|---|---|
|
||||||
|
| `47321e3` | restore driving logic to pre-variable-fps baseline |
|
||||||
|
| `f7e602c` | controlsd: re-wire UI hooks on top of restored baseline |
|
||||||
|
| `887b9c9` | parked-controlsd mode: shut down heavy stack while ignition+park |
|
||||||
|
|
||||||
|
Pre-revert tip was `62a403d`. The on-device agent should treat
|
||||||
|
**`62a403d` as "the broken version"** when looking at history.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Commit 1: `47321e3` — Baseline restore
|
||||||
|
|
||||||
|
Reverted the following files wholesale to their `980f0aa` archive copy:
|
||||||
|
|
||||||
|
- `selfdrive/controls/controlsd.py`
|
||||||
|
- `selfdrive/controls/lib/events.py`
|
||||||
|
- `selfdrive/controls/lib/longitudinal_planner.py`
|
||||||
|
- `selfdrive/modeld/modeld.py`
|
||||||
|
- `selfdrive/modeld/dmonitoringmodeld.py`
|
||||||
|
- `selfdrive/locationd/calibrationd.py`
|
||||||
|
- `selfdrive/locationd/paramsd.py`
|
||||||
|
- `selfdrive/locationd/torqued.py`
|
||||||
|
- `selfdrive/car/interfaces.py`
|
||||||
|
- `selfdrive/car/hyundai/carstate.py` (CAN-FD telemetry preserved as a
|
||||||
|
commented block at the bottom of `update_canfd` — re-enable by
|
||||||
|
uncommenting; the `tlog` import is also commented out).
|
||||||
|
- `selfdrive/monitoring/dmonitoringd.py`
|
||||||
|
- `selfdrive/frogpilot/controls/frogpilot_planner.py`
|
||||||
|
- `common/realtime.py`
|
||||||
|
|
||||||
|
### Intentionally NOT restored (kept as the post-`62a403d` version)
|
||||||
|
|
||||||
|
- `selfdrive/thermald/*` — fan/power tuning kept as-is.
|
||||||
|
- `selfdrive/car/hyundai/carcontroller.py` and `hyundaicanfd.py` —
|
||||||
|
reviewed; the only delta vs baseline is hoisting the
|
||||||
|
`no_lat_lane_change` Params read out of the 100Hz hot path
|
||||||
|
(~5% carcontroller CPU) by passing the bit as an argument. This is
|
||||||
|
a perf-only change with no behavioral effect outside lane changes.
|
||||||
|
- `cereal/services.py`, `cereal/custom.capnp` — additive only.
|
||||||
|
`custom.capnp` adds `latRequested @3` and `noLatLaneChange @4` fields
|
||||||
|
to `FrogPilotCarControl`; capnp tag numbers are append-only so this is
|
||||||
|
safe to leave in even though baseline code doesn't write to them.
|
||||||
|
- `selfdrive/manager/*`, `common/params.cc` — heavy ClearPilot
|
||||||
|
infrastructure (bench mode, log dir, dashcamd, gpsd, ClearPilot params).
|
||||||
|
- All `selfdrive/ui/`, `selfdrive/clearpilot/`, `system/clearpilot/`.
|
||||||
|
|
||||||
|
### Things removed by the restore (no longer in the tree)
|
||||||
|
|
||||||
|
- Standstill frame skipping in modeld (was: skip GPU inference 19/20
|
||||||
|
frames at standstill, report 0 dropped frames to fool controlsd).
|
||||||
|
- Standstill frame skipping in dmonitoringmodeld.
|
||||||
|
- Model standby logic + `ModelStandby`/`ModelStandbyTs` reads in
|
||||||
|
controlsd's comm-issue suppression path.
|
||||||
|
- Parked-cycle skip in `state_control()` (10Hz vs 100Hz when in Park).
|
||||||
|
- Calibrationd validity decoupling from `sm.all_checks()`.
|
||||||
|
- Post-engage 2s commIssue/location/params suppression window.
|
||||||
|
- Per-cycle carstate write-gating for `CarSpeedLimit`/`CarIsMetric` in
|
||||||
|
carstate.py.
|
||||||
|
- The diff-based carstate telemetry calls (preserved commented out).
|
||||||
|
|
||||||
|
These were the candidates for the steering-pull regression. The on-device
|
||||||
|
agent should **not re-introduce any of these** without a deliberate plan
|
||||||
|
and a test session. The user's intent is to get baseline driving feel
|
||||||
|
confirmed first, then re-introduce optimizations one at a time and drive
|
||||||
|
each.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Commit 2: `f7e602c` — UI hooks on top of baseline
|
||||||
|
|
||||||
|
Baseline `controlsd.py` doesn't have the UI plumbing the ClearPilot UI
|
||||||
|
expects. This commit re-adds only the things needed for the existing UI
|
||||||
|
to keep working — pure params-write plumbing, no actuator effect.
|
||||||
|
|
||||||
|
### Changes (all in `selfdrive/controls/controlsd.py`)
|
||||||
|
|
||||||
|
1. **Import `SpeedState`** from `openpilot.selfdrive.clearpilot.speed_logic`.
|
||||||
|
2. **`Controls.__init__`**:
|
||||||
|
- `params_memory.put_bool("ScreenDisplayMode", 0)` →
|
||||||
|
`params_memory.put_int("ScreenDisplayMode", 0)` (UI reads it as int).
|
||||||
|
- Added `self.speed_state = SpeedState()`, `self.speed_state_frame = 0`,
|
||||||
|
`self.was_driving_gear = False`.
|
||||||
|
3. **SubMaster** — added `gpsLocation` to the subscriber list, with
|
||||||
|
`ignore_alive` / `ignore_avg_freq` / `ignore_valid` so missing GPS
|
||||||
|
doesn't trigger commIssue.
|
||||||
|
4. **`clearpilot_state_control(...)`** rewritten from a simple 3-state
|
||||||
|
cycle into the documented 5-state ScreenDisplayMode machine:
|
||||||
|
- Auto-wake on park→drive edge if currently in screen-off (state 3).
|
||||||
|
- LFA button transitions in drive: `0→4, 1→2, 2→3, 3→4, 4→2`.
|
||||||
|
- LFA button transitions outside drive: any except 3 → 3, state 3 → 0.
|
||||||
|
- Speed/cruise-warning overlay tick at ~2Hz (`speed_state.update(...)`)
|
||||||
|
reading `gpsLocation`, `CarSpeedLimit` param, `self.is_metric`,
|
||||||
|
`CS.cruiseState`. This is what writes
|
||||||
|
`ClearpilotSpeedDisplay`/`ClearpilotSpeedLimitDisplay`/
|
||||||
|
`ClearpilotCruiseWarning` for the UI overlay.
|
||||||
|
5. **Lane-change suppression sync** — at the existing baseline
|
||||||
|
`clearpilot_disable_lat_on_lane_change` block (around line 687), in
|
||||||
|
addition to the existing `params_memory.put_bool("no_lat_lane_change", ...)`,
|
||||||
|
also set `self.frogpilot_variables.no_lat_lane_change = True/False`.
|
||||||
|
This is required because the kept (post-`62a403d`) carcontroller
|
||||||
|
reads off `frogpilot_variables`, not Params.
|
||||||
|
|
||||||
|
### What does NOT change
|
||||||
|
|
||||||
|
No edits to lateral or longitudinal control paths. No new actuator-side
|
||||||
|
behavior. The UI features (nightrider/screen-off/auto day-night, speed
|
||||||
|
overlay, cruise warning chime) get wired back up purely through param
|
||||||
|
writes.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Commit 3: `887b9c9` — Parked-controlsd mode
|
||||||
|
|
||||||
|
Architectural fix for the original problem this whole session was
|
||||||
|
chasing: while ignition is on but the car is in Park, the entire onroad
|
||||||
|
stack (modeld, planner, control, locationd, calibrationd, paramsd,
|
||||||
|
torqued, dmonitoring*, soundd, loggerd) is running and burning CPU/fan
|
||||||
|
even though none of it is needed.
|
||||||
|
|
||||||
|
Solution: redefine "onroad" as **ignition AND not parked** instead of
|
||||||
|
just **ignition**. Reuse the existing `started`-based process gating in
|
||||||
|
manager. Add a tiny second controlsd variant that runs while parked,
|
||||||
|
just to keep CAN flowing so thermald can see when gear leaves Park.
|
||||||
|
|
||||||
|
### Files
|
||||||
|
|
||||||
|
#### NEW: `selfdrive/controls/controlsd_parked.py`
|
||||||
|
|
||||||
|
Minimal entry point. Roughly:
|
||||||
|
|
||||||
|
```python
|
||||||
|
def main():
|
||||||
|
config_realtime_process(4, Priority.CTRL_HIGH)
|
||||||
|
card = CarD() # blocks until first CAN, fingerprints car
|
||||||
|
card.initialize()
|
||||||
|
fv = _make_default_frogpilot_variables() # safe False/0 SimpleNamespace
|
||||||
|
while True:
|
||||||
|
card.state_update(fv) # publishes carState/carOutput/carParams
|
||||||
|
```
|
||||||
|
|
||||||
|
`CarD.state_update` blocks via `drain_sock_raw(wait_for_one=True)`, so
|
||||||
|
the loop is paced by CAN traffic — no extra sleep, no CPU spin.
|
||||||
|
|
||||||
|
The default `frogpilot_variables` sets these to safe values so
|
||||||
|
`CarInterfaceBase.update` doesn't `AttributeError`:
|
||||||
|
`conditional_experimental_mode`, `experimental_mode_via_distance`,
|
||||||
|
`traffic_mode`, `sport_plus`, `long_pitch`, `no_lat_lane_change` — all
|
||||||
|
False.
|
||||||
|
|
||||||
|
#### `selfdrive/thermald/thermald.py`
|
||||||
|
|
||||||
|
- `onroad_conditions` now also has `"not_parked"`. Initialized to
|
||||||
|
`False` (assume parked at boot) so the heavy stack waits for carState
|
||||||
|
to confirm gear has left Park before spinning up.
|
||||||
|
- New module-level loop variables: `is_parked = True`,
|
||||||
|
`parked_since: float | None = None`, `PARKED_HYSTERESIS_S = 1.5`,
|
||||||
|
`ignition_param_prev: bool | None = None`.
|
||||||
|
- New block right after the panda-disconnect check (around line 258 in
|
||||||
|
current state) reads `sm['carState'].gearShifter`:
|
||||||
|
- Gear == Park: latch `parked_since`, flip `is_parked = True` after
|
||||||
|
1.5s of continuous Park (hysteresis).
|
||||||
|
- Gear != Park: clear `parked_since`, `is_parked = False` immediately
|
||||||
|
(no hysteresis going out).
|
||||||
|
- Reverse is treated as **not parked** — driver is moving.
|
||||||
|
- `onroad_conditions["not_parked"] = not is_parked` every tick.
|
||||||
|
- New `IgnitionOn` Params write, edge-driven (only on change of
|
||||||
|
`onroad_conditions["ignition"]`) so we don't hammer the persistent
|
||||||
|
filesystem 2x/sec.
|
||||||
|
|
||||||
|
#### `selfdrive/manager/process_config.py`
|
||||||
|
|
||||||
|
- New predicate:
|
||||||
|
```python
|
||||||
|
def parked_only(started, params, CP):
|
||||||
|
return params.get_bool("IgnitionOn") and not started
|
||||||
|
```
|
||||||
|
- New process entry directly after the existing `controlsd` entry:
|
||||||
|
```python
|
||||||
|
PythonProcess("controlsd_parked", "selfdrive.controls.controlsd_parked", parked_only),
|
||||||
|
```
|
||||||
|
- `controlsd` is unchanged (still `only_onroad`). Mutually exclusive
|
||||||
|
with `parked_only` because `started` is the negation of the relevant
|
||||||
|
condition.
|
||||||
|
|
||||||
|
#### `selfdrive/manager/manager.py`
|
||||||
|
|
||||||
|
- Single line in `manager_init()` seeding `IgnitionOn=False` so the
|
||||||
|
predicate evaluates correctly before thermald's first tick.
|
||||||
|
|
||||||
|
#### `common/params.cc`
|
||||||
|
|
||||||
|
- New entry in the alphabetical I-block:
|
||||||
|
```cpp
|
||||||
|
{"IgnitionOn", CLEAR_ON_MANAGER_START},
|
||||||
|
```
|
||||||
|
- Manager predicates can only see persistent Params (not pandaStates
|
||||||
|
or `/dev/shm/params`), which is why thermald has to expose ignition
|
||||||
|
this way.
|
||||||
|
|
||||||
|
### State machine summary
|
||||||
|
|
||||||
|
| Ignition | Gear | Predicates true | What runs |
|
||||||
|
|----------|-------------|-----------------------------|-------------------------------|
|
||||||
|
| off | any | none | always_run only (ui, thermald, pandad, deleter, …) |
|
||||||
|
| on | Park (>1.5s)| parked_only | always_run + controlsd_parked |
|
||||||
|
| on | not Park | only_onroad (=> all `started`) | full onroad stack |
|
||||||
|
|
||||||
|
The transition between rows 2 and 3 is purely manager process
|
||||||
|
swapping driven by predicate flips — no IPC handshake, no
|
||||||
|
self-termination. Thermald sees gear change, flips `not_parked`,
|
||||||
|
`should_start = all(onroad_conditions.values())` flips, manager kills
|
||||||
|
the wrong variant and spawns the right one on its next tick.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What the on-device agent needs to know / debug
|
||||||
|
|
||||||
|
### Build prerequisites
|
||||||
|
|
||||||
|
A new param key was added (`IgnitionOn`), so the C++ params whitelist
|
||||||
|
needs a fresh build. Per `CLAUDE.md` "Adding New Params":
|
||||||
|
|
||||||
|
```bash
|
||||||
|
chown -R comma:comma /data/openpilot
|
||||||
|
rm -f /data/openpilot/prebuilt /data/openpilot/common/params.o /data/openpilot/common/libcommon.a
|
||||||
|
su - comma -c "bash /data/openpilot/build_only.sh"
|
||||||
|
```
|
||||||
|
|
||||||
|
`build_only.sh` already deletes `prebuilt` but **does not** delete
|
||||||
|
`params.o` / `libcommon.a` — verify those are gone before building or
|
||||||
|
the new key won't be picked up and `Params().put_bool("IgnitionOn", …)`
|
||||||
|
will throw `UnknownKeyName` in manager_init or thermald.
|
||||||
|
|
||||||
|
### How to verify the swap is working
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# 1. Watch which controlsd variant is running
|
||||||
|
watch -n 1 'ps -ef | grep -E "controlsd(_parked)?" | grep -v grep'
|
||||||
|
|
||||||
|
# 2. Watch the gating signals
|
||||||
|
watch -n 1 'echo "IgnitionOn:"; cat /data/params/d/IgnitionOn 2>/dev/null; \
|
||||||
|
echo; echo "deviceState.started:" ; \
|
||||||
|
python3 -c "import cereal.messaging as m; \
|
||||||
|
s=m.sub_sock(\"deviceState\", timeout=1000); \
|
||||||
|
print(m.recv_one(s).deviceState.started)"'
|
||||||
|
|
||||||
|
# 3. Watch gear via carState
|
||||||
|
python3 -c "
|
||||||
|
import cereal.messaging as m
|
||||||
|
s = m.sub_sock('carState', timeout=2000)
|
||||||
|
while True:
|
||||||
|
msg = m.recv_one(s)
|
||||||
|
if msg: print(msg.carState.gearShifter)
|
||||||
|
"
|
||||||
|
```
|
||||||
|
|
||||||
|
Expected behavior:
|
||||||
|
- Ignition off: neither variant in `ps`. `IgnitionOn` is `0`/missing.
|
||||||
|
- Ignition on, in Park: `controlsd_parked` in `ps`, `controlsd` is not.
|
||||||
|
`started` is False. After ~1.5s of confirmed Park, modeld and friends
|
||||||
|
should have stopped.
|
||||||
|
- Shift to Drive: `controlsd_parked` disappears within ~500ms, full
|
||||||
|
`controlsd` appears, all the onroad processes spin up. There will be
|
||||||
|
a brief carState gap during the swap (~0.5–2s).
|
||||||
|
|
||||||
|
### What to suspect first if something breaks
|
||||||
|
|
||||||
|
1. **Manager crash on startup** with `UnknownKeyName: IgnitionOn`. Means
|
||||||
|
`params.o`/`libcommon.a` weren't rebuilt. Delete them and rebuild.
|
||||||
|
2. **`controlsd_parked` keeps respawning / dying.** Check
|
||||||
|
`/data/log2/current/controlsd_parked.log`. Most likely either:
|
||||||
|
- `CarD.__init__` is hanging on `get_one_can` because pandad isn't
|
||||||
|
up yet — should only matter on the very first boot.
|
||||||
|
- A `frogpilot_variables` attribute we missed defaulting; add it to
|
||||||
|
`_make_default_frogpilot_variables` in `controlsd_parked.py`.
|
||||||
|
3. **Full controlsd never spawns after shift to Drive.** Check
|
||||||
|
`IgnitionOn` (should be `1`), check carState.gearShifter (should not
|
||||||
|
be `park`/`unknown`), check thermald.log for `should_start` logic.
|
||||||
|
Also check that `carState` is actually being published by
|
||||||
|
`controlsd_parked`.
|
||||||
|
4. **Steering still pulls right.** The whole point of commit 1 is to
|
||||||
|
rule out the variable-FPS work as the cause. If the symptom persists
|
||||||
|
on baseline-restored driving logic, the suspect list shifts to:
|
||||||
|
- Anything still on the kept side (carcontroller's
|
||||||
|
`no_lat_lane_change` plumbing, thermald-related changes,
|
||||||
|
custom.capnp additions).
|
||||||
|
- Hardware: a calibration that drifted, an alignment issue, panda
|
||||||
|
CAN bus issue, or torque tuning that was modified outside of these
|
||||||
|
files.
|
||||||
|
- The custom driving model selected (`Params("Model")`). Confirm
|
||||||
|
which `.thneed`/`.onnx` is loaded — none of the model files
|
||||||
|
themselves were changed in this session.
|
||||||
|
5. **Panda safety alerts during park transition.** If panda logs a
|
||||||
|
"lost heartbeat" or drops to NOOUTPUT mode in the swap window, we
|
||||||
|
need controlsd_parked to issue a no-op carcontrol heartbeat to keep
|
||||||
|
panda happy. Not implemented in this session — flag for follow-up.
|
||||||
|
|
||||||
|
### Open follow-up items (NOT done in this session)
|
||||||
|
|
||||||
|
- **Cold-start latency on shift-from-Park.** Modeld load + calibration
|
||||||
|
warmup may produce a noticeable gap before lateral/long are ready.
|
||||||
|
Anticipatory wake on `CS.brakePressed && in_park` is the planned
|
||||||
|
mitigation if needed.
|
||||||
|
- **Methodically reintroduce optimizations.** Once baseline driving
|
||||||
|
feels right, the standstill optimizations (modeld 1fps, fan clamps,
|
||||||
|
etc.) can come back one at a time, each with a drive test.
|
||||||
|
- **The CAN-FD telemetry block in `selfdrive/car/hyundai/carstate.py`**
|
||||||
|
is preserved as a commented block at the bottom of `update_canfd()`.
|
||||||
|
Re-enabling requires uncommenting + restoring the `tlog` import at
|
||||||
|
the top of the file.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Key file index for fast navigation
|
||||||
|
|
||||||
|
```
|
||||||
|
selfdrive/controls/controlsd.py # full controlsd, restored to baseline + UI hooks re-added
|
||||||
|
selfdrive/controls/controlsd_parked.py # NEW: parked-only CAN listener
|
||||||
|
selfdrive/controls/clearpilot_state_control # in controlsd.py, ~line 1255 — 5-state ScreenDisplayMode + speed_state tick
|
||||||
|
selfdrive/thermald/thermald.py # gear-aware not_parked + IgnitionOn writer (~line 258)
|
||||||
|
selfdrive/manager/process_config.py # parked_only predicate + new entry
|
||||||
|
selfdrive/manager/manager.py # IgnitionOn seed in manager_init
|
||||||
|
common/params.cc # IgnitionOn registered (CLEAR_ON_MANAGER_START)
|
||||||
|
selfdrive/car/card.py # CarD class — used by both controlsd variants, unchanged
|
||||||
|
selfdrive/clearpilot/speed_logic.py # SpeedState class — unchanged, called from controlsd.clearpilot_state_control
|
||||||
|
selfdrive/car/hyundai/carstate.py # restored baseline + commented telemetry block at end of update_canfd
|
||||||
|
```
|
||||||
|
|
||||||
|
## Reproducing the diff per commit
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git show 47321e3 # baseline restore
|
||||||
|
git show f7e602c # UI hooks
|
||||||
|
git show 887b9c9 # parked mode
|
||||||
|
git diff 62a403d..887b9c9 # full session delta
|
||||||
|
```
|
||||||
Reference in New Issue
Block a user