Files
clearpilot/sessions/2026-04-26-0914-baseline-revert-and-parked-mode/README.md
T
brianhansonxyz ab9158bfb7 adopt pre-modelrevert clearpilot tree (d639e28) as the new head
Discard the modelrevert tree adoption (8b4b7e0) and the in-process park
short-circuits / cached-output / dashcam-idle work that came with it
(0dc8002, 37e095e). Restore the clearpilot tree as it stood at d639e28 —
the parked-controlsd manager-process split, the GPS-disable in locationd,
the controlsd UI hooks, the boardd ignition-edge safety_setter_thread
fix. After a full /data/params/d wipe and re-calibration drive, the
modelrevert-tree variant overcorrected on turns; reverting to the
parked-controlsd architecture (which Brian had previously vetted and
documented in 887b9c9 + 27cad05) and starting fresh.

Single new commit, no merge — file state matches d639e28 byte-for-byte.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-26 14:17:25 -05:00

16 KiB
Raw Blame History

Session: 2026-04-26 — Baseline Revert + Parked-Controlsd Mode

Context

This session was driven by a regression: the steering wheel "feels like it pulls right" during normal driving, with no clear smoking gun. The suspicion was that one of the variable-FPS / standstill-throttling changes (added to reduce parked-state fan noise and CPU load) bled into on-road driving behavior in a hard-to-isolate way.

Strategy: revert all driving-relevant logic to a known-good baseline captured in /projects/openpilot/archive/clearpilot (HEAD 980f0aa, July 2024), keep all of the ClearPilot UI/dashcam/telemetry/bench-mode infrastructure intact on top, then attack the parked-fan-noise problem fresh from a different angle that doesn't touch driving logic at all.

Three commits landed in this order on branch clearpilot:

SHA Title
47321e3 restore driving logic to pre-variable-fps baseline
f7e602c controlsd: re-wire UI hooks on top of restored baseline
887b9c9 parked-controlsd mode: shut down heavy stack while ignition+park

Pre-revert tip was 62a403d. The on-device agent should treat 62a403d as "the broken version" when looking at history.


Commit 1: 47321e3 — Baseline restore

Reverted the following files wholesale to their 980f0aa archive copy:

  • selfdrive/controls/controlsd.py
  • selfdrive/controls/lib/events.py
  • selfdrive/controls/lib/longitudinal_planner.py
  • selfdrive/modeld/modeld.py
  • selfdrive/modeld/dmonitoringmodeld.py
  • selfdrive/locationd/calibrationd.py
  • selfdrive/locationd/paramsd.py
  • selfdrive/locationd/torqued.py
  • selfdrive/car/interfaces.py
  • selfdrive/car/hyundai/carstate.py (CAN-FD telemetry preserved as a commented block at the bottom of update_canfd — re-enable by uncommenting; the tlog import is also commented out).
  • selfdrive/monitoring/dmonitoringd.py
  • selfdrive/frogpilot/controls/frogpilot_planner.py
  • common/realtime.py

Intentionally NOT restored (kept as the post-62a403d version)

  • selfdrive/thermald/* — fan/power tuning kept as-is.
  • selfdrive/car/hyundai/carcontroller.py and hyundaicanfd.py — reviewed; the only delta vs baseline is hoisting the no_lat_lane_change Params read out of the 100Hz hot path (~5% carcontroller CPU) by passing the bit as an argument. This is a perf-only change with no behavioral effect outside lane changes.
  • cereal/services.py, cereal/custom.capnp — additive only. custom.capnp adds latRequested @3 and noLatLaneChange @4 fields to FrogPilotCarControl; capnp tag numbers are append-only so this is safe to leave in even though baseline code doesn't write to them.
  • selfdrive/manager/*, common/params.cc — heavy ClearPilot infrastructure (bench mode, log dir, dashcamd, gpsd, ClearPilot params).
  • All selfdrive/ui/, selfdrive/clearpilot/, system/clearpilot/.

Things removed by the restore (no longer in the tree)

  • Standstill frame skipping in modeld (was: skip GPU inference 19/20 frames at standstill, report 0 dropped frames to fool controlsd).
  • Standstill frame skipping in dmonitoringmodeld.
  • Model standby logic + ModelStandby/ModelStandbyTs reads in controlsd's comm-issue suppression path.
  • Parked-cycle skip in state_control() (10Hz vs 100Hz when in Park).
  • Calibrationd validity decoupling from sm.all_checks().
  • Post-engage 2s commIssue/location/params suppression window.
  • Per-cycle carstate write-gating for CarSpeedLimit/CarIsMetric in carstate.py.
  • The diff-based carstate telemetry calls (preserved commented out).

These were the candidates for the steering-pull regression. The on-device agent should not re-introduce any of these without a deliberate plan and a test session. The user's intent is to get baseline driving feel confirmed first, then re-introduce optimizations one at a time and drive each.


Commit 2: f7e602c — UI hooks on top of baseline

Baseline controlsd.py doesn't have the UI plumbing the ClearPilot UI expects. This commit re-adds only the things needed for the existing UI to keep working — pure params-write plumbing, no actuator effect.

Changes (all in selfdrive/controls/controlsd.py)

  1. Import SpeedState from openpilot.selfdrive.clearpilot.speed_logic.
  2. Controls.__init__:
    • params_memory.put_bool("ScreenDisplayMode", 0)params_memory.put_int("ScreenDisplayMode", 0) (UI reads it as int).
    • Added self.speed_state = SpeedState(), self.speed_state_frame = 0, self.was_driving_gear = False.
  3. SubMaster — added gpsLocation to the subscriber list, with ignore_alive / ignore_avg_freq / ignore_valid so missing GPS doesn't trigger commIssue.
  4. clearpilot_state_control(...) rewritten from a simple 3-state cycle into the documented 5-state ScreenDisplayMode machine:
    • Auto-wake on park→drive edge if currently in screen-off (state 3).
    • LFA button transitions in drive: 0→4, 1→2, 2→3, 3→4, 4→2.
    • LFA button transitions outside drive: any except 3 → 3, state 3 → 0.
    • Speed/cruise-warning overlay tick at ~2Hz (speed_state.update(...)) reading gpsLocation, CarSpeedLimit param, self.is_metric, CS.cruiseState. This is what writes ClearpilotSpeedDisplay/ClearpilotSpeedLimitDisplay/ ClearpilotCruiseWarning for the UI overlay.
  5. Lane-change suppression sync — at the existing baseline clearpilot_disable_lat_on_lane_change block (around line 687), in addition to the existing params_memory.put_bool("no_lat_lane_change", ...), also set self.frogpilot_variables.no_lat_lane_change = True/False. This is required because the kept (post-62a403d) carcontroller reads off frogpilot_variables, not Params.

What does NOT change

No edits to lateral or longitudinal control paths. No new actuator-side behavior. The UI features (nightrider/screen-off/auto day-night, speed overlay, cruise warning chime) get wired back up purely through param writes.


Commit 3: 887b9c9 — Parked-controlsd mode

Architectural fix for the original problem this whole session was chasing: while ignition is on but the car is in Park, the entire onroad stack (modeld, planner, control, locationd, calibrationd, paramsd, torqued, dmonitoring*, soundd, loggerd) is running and burning CPU/fan even though none of it is needed.

Solution: redefine "onroad" as ignition AND not parked instead of just ignition. Reuse the existing started-based process gating in manager. Add a tiny second controlsd variant that runs while parked, just to keep CAN flowing so thermald can see when gear leaves Park.

Files

NEW: selfdrive/controls/controlsd_parked.py

Minimal entry point. Roughly:

def main():
    config_realtime_process(4, Priority.CTRL_HIGH)
    card = CarD()                  # blocks until first CAN, fingerprints car
    card.initialize()
    fv = _make_default_frogpilot_variables()  # safe False/0 SimpleNamespace
    while True:
        card.state_update(fv)      # publishes carState/carOutput/carParams

CarD.state_update blocks via drain_sock_raw(wait_for_one=True), so the loop is paced by CAN traffic — no extra sleep, no CPU spin.

The default frogpilot_variables sets these to safe values so CarInterfaceBase.update doesn't AttributeError: conditional_experimental_mode, experimental_mode_via_distance, traffic_mode, sport_plus, long_pitch, no_lat_lane_change — all False.

selfdrive/thermald/thermald.py

  • onroad_conditions now also has "not_parked". Initialized to False (assume parked at boot) so the heavy stack waits for carState to confirm gear has left Park before spinning up.
  • New module-level loop variables: is_parked = True, parked_since: float | None = None, PARKED_HYSTERESIS_S = 1.5, ignition_param_prev: bool | None = None.
  • New block right after the panda-disconnect check (around line 258 in current state) reads sm['carState'].gearShifter:
    • Gear == Park: latch parked_since, flip is_parked = True after 1.5s of continuous Park (hysteresis).
    • Gear != Park: clear parked_since, is_parked = False immediately (no hysteresis going out).
    • Reverse is treated as not parked — driver is moving.
  • onroad_conditions["not_parked"] = not is_parked every tick.
  • New IgnitionOn Params write, edge-driven (only on change of onroad_conditions["ignition"]) so we don't hammer the persistent filesystem 2x/sec.

selfdrive/manager/process_config.py

  • New predicate:
    def parked_only(started, params, CP):
        return params.get_bool("IgnitionOn") and not started
    
  • New process entry directly after the existing controlsd entry:
    PythonProcess("controlsd_parked", "selfdrive.controls.controlsd_parked", parked_only),
    
  • controlsd is unchanged (still only_onroad). Mutually exclusive with parked_only because started is the negation of the relevant condition.

selfdrive/manager/manager.py

  • Single line in manager_init() seeding IgnitionOn=False so the predicate evaluates correctly before thermald's first tick.

common/params.cc

  • New entry in the alphabetical I-block:
    {"IgnitionOn", CLEAR_ON_MANAGER_START},
    
  • Manager predicates can only see persistent Params (not pandaStates or /dev/shm/params), which is why thermald has to expose ignition this way.

State machine summary

Ignition Gear Predicates true What runs
off any none always_run only (ui, thermald, pandad, deleter, …)
on Park (>1.5s) parked_only always_run + controlsd_parked
on not Park only_onroad (=> all started) full onroad stack

The transition between rows 2 and 3 is purely manager process swapping driven by predicate flips — no IPC handshake, no self-termination. Thermald sees gear change, flips not_parked, should_start = all(onroad_conditions.values()) flips, manager kills the wrong variant and spawns the right one on its next tick.


What the on-device agent needs to know / debug

Build prerequisites

A new param key was added (IgnitionOn), so the C++ params whitelist needs a fresh build. Per CLAUDE.md "Adding New Params":

chown -R comma:comma /data/openpilot
rm -f /data/openpilot/prebuilt /data/openpilot/common/params.o /data/openpilot/common/libcommon.a
su - comma -c "bash /data/openpilot/build_only.sh"

build_only.sh already deletes prebuilt but does not delete params.o / libcommon.a — verify those are gone before building or the new key won't be picked up and Params().put_bool("IgnitionOn", …) will throw UnknownKeyName in manager_init or thermald.

How to verify the swap is working

# 1. Watch which controlsd variant is running
watch -n 1 'ps -ef | grep -E "controlsd(_parked)?" | grep -v grep'

# 2. Watch the gating signals
watch -n 1 'echo "IgnitionOn:"; cat /data/params/d/IgnitionOn 2>/dev/null; \
            echo; echo "deviceState.started:" ; \
            python3 -c "import cereal.messaging as m; \
              s=m.sub_sock(\"deviceState\", timeout=1000); \
              print(m.recv_one(s).deviceState.started)"'

# 3. Watch gear via carState
python3 -c "
import cereal.messaging as m
s = m.sub_sock('carState', timeout=2000)
while True:
    msg = m.recv_one(s)
    if msg: print(msg.carState.gearShifter)
"

Expected behavior:

  • Ignition off: neither variant in ps. IgnitionOn is 0/missing.
  • Ignition on, in Park: controlsd_parked in ps, controlsd is not. started is False. After ~1.5s of confirmed Park, modeld and friends should have stopped.
  • Shift to Drive: controlsd_parked disappears within ~500ms, full controlsd appears, all the onroad processes spin up. There will be a brief carState gap during the swap (~0.52s).

What to suspect first if something breaks

  1. Manager crash on startup with UnknownKeyName: IgnitionOn. Means params.o/libcommon.a weren't rebuilt. Delete them and rebuild.
  2. controlsd_parked keeps respawning / dying. Check /data/log2/current/controlsd_parked.log. Most likely either:
    • CarD.__init__ is hanging on get_one_can because pandad isn't up yet — should only matter on the very first boot.
    • A frogpilot_variables attribute we missed defaulting; add it to _make_default_frogpilot_variables in controlsd_parked.py.
  3. Full controlsd never spawns after shift to Drive. Check IgnitionOn (should be 1), check carState.gearShifter (should not be park/unknown), check thermald.log for should_start logic. Also check that carState is actually being published by controlsd_parked.
  4. Steering still pulls right. The whole point of commit 1 is to rule out the variable-FPS work as the cause. If the symptom persists on baseline-restored driving logic, the suspect list shifts to:
    • Anything still on the kept side (carcontroller's no_lat_lane_change plumbing, thermald-related changes, custom.capnp additions).
    • Hardware: a calibration that drifted, an alignment issue, panda CAN bus issue, or torque tuning that was modified outside of these files.
    • The custom driving model selected (Params("Model")). Confirm which .thneed/.onnx is loaded — none of the model files themselves were changed in this session.
  5. Panda safety alerts during park transition. If panda logs a "lost heartbeat" or drops to NOOUTPUT mode in the swap window, we need controlsd_parked to issue a no-op carcontrol heartbeat to keep panda happy. Not implemented in this session — flag for follow-up.

Open follow-up items (NOT done in this session)

  • Cold-start latency on shift-from-Park. Modeld load + calibration warmup may produce a noticeable gap before lateral/long are ready. Anticipatory wake on CS.brakePressed && in_park is the planned mitigation if needed.
  • Methodically reintroduce optimizations. Once baseline driving feels right, the standstill optimizations (modeld 1fps, fan clamps, etc.) can come back one at a time, each with a drive test.
  • The CAN-FD telemetry block in selfdrive/car/hyundai/carstate.py is preserved as a commented block at the bottom of update_canfd(). Re-enabling requires uncommenting + restoring the tlog import at the top of the file.

Key file index for fast navigation

selfdrive/controls/controlsd.py                        # full controlsd, restored to baseline + UI hooks re-added
selfdrive/controls/controlsd_parked.py                 # NEW: parked-only CAN listener
selfdrive/controls/clearpilot_state_control            # in controlsd.py, ~line 1255 — 5-state ScreenDisplayMode + speed_state tick
selfdrive/thermald/thermald.py                         # gear-aware not_parked + IgnitionOn writer (~line 258)
selfdrive/manager/process_config.py                    # parked_only predicate + new entry
selfdrive/manager/manager.py                           # IgnitionOn seed in manager_init
common/params.cc                                       # IgnitionOn registered (CLEAR_ON_MANAGER_START)
selfdrive/car/card.py                                  # CarD class — used by both controlsd variants, unchanged
selfdrive/clearpilot/speed_logic.py                    # SpeedState class — unchanged, called from controlsd.clearpilot_state_control
selfdrive/car/hyundai/carstate.py                      # restored baseline + commented telemetry block at end of update_canfd

Reproducing the diff per commit

git show 47321e3            # baseline restore
git show f7e602c            # UI hooks
git show 887b9c9            # parked mode
git diff 62a403d..887b9c9   # full session delta