Two related fixes for the post-engage lag spike:
1. controlsdLagging suppressed during lat_engage_suppress window.
Ratekeeper.lagging triggers when avg cycle duration over 100 cycles
exceeds 11.1ms (90% of 10ms budget). The modeld 10→20fps ramp causes
a legitimate transient where downstream services (plannerd, locationd,
calibrationd, paramsd) each drain 2x the message rate, briefly pushing
avg cycle time past the threshold. The underlying system isn't broken —
it's correctly absorbing a scheduled workload transition.
2. frogpilotCarControl now sends only on change (+ 1Hz keepalive) instead
of every 10ms. The message has 3 bool fields, of which speedLimitChanged
code is entirely commented out, trafficModeActive flips only on UI
button press, and alwaysOnLateral changes only on cruise/gear/brake
edges. plannerd doesn't include frogpilotCarControl in its all_checks
list so stale-freq detection isn't a concern. Saves ~7ms/sec of
capnp build + zmq send work.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously ran unpinned (affinity mask 0xff) across all 8 cores. When it
landed on core 4 (controlsd) or 5 (plannerd/radard) or 7 (modeld), its
70MB/s frame copies and MP4 muxing caused cache/memory-bandwidth
contention with the RT-pinned processes. SCHED_FIFO prevented direct
preemption but not the cache thrash.
OMX offloads actual H.264 work to hardware so the main thread is
lightweight — fine on the little cluster.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
System health overlay:
- Lower-right 5-metric panel: LAG (controlsState.cumLagMs), DROP
(modelV2.frameDropPerc), TEMP (deviceState.maxTempC), CPU (max core of
deviceState.cpuUsagePercent), MEM (deviceState.memoryUsagePercent)
- Color-coded white→yellow→red by severity (LAG: 50/200ms, DROP: 5/15%,
TEMP: 75/88°C, CPU: 75/90%, MEM: 70/85%)
- Toggle in ClearPilot → Debug → "System Health Overlay"
- New param ClearpilotShowHealthMetrics, PERSISTENT (disk, survives
reboots), default false — re-polled every ~2s so toggle takes effect
without process restart
- InterFont(90, Bold) to match speed limit numeric styling, 30px margin,
40px between rows, black rounded background
Nightrider center lane path (the "tire track" polygon from
scene.track_vertices) is now drawn at 2x the width of other lines —
highlights the planned path distinctly against the otherwise stark
outline-only rendering.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Audit of post-fork param additions found CarCruiseDisplayActual was
written every CAN cycle (gated) but only consumed by hyundaicanfd.py::
create_buttons_alt, which has `return` on line 1 and no active callers.
Write was pure waste. Removed registration, write path, cache field,
and the dead read.
Also dropped the now-unused `from openpilot.common.params import Params`
in hyundaicanfd.py.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reduced-rate modeld path now branches on IsDaylight:
- daylight: skip 1/2 frames → 10fps (better model responsiveness when
lighting gives the net more signal)
- night: skip 4/5 frames → 4fps (unchanged, conservative for power)
IsDaylight is already in /dev/shm (memory) via gpsd.py. Gated the
IsDaylight write on change — it flips twice a day, no reason to rewrite
every 30s. GPS polling bumped from 1Hz → 2Hz.
ModelFps publishes "10" / "4" / "20" so longitudinal_planner's dt and
FCW-threshold scaling (if re-enabled) still track actual rate.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
create_steering_messages was constructing a new Params("/dev/shm/params")
object and reading no_lat_lane_change on every CAN steering message build
— i.e. 100 allocations + 100 file reads per second. Now the Params
instance lives on CarController, and the value is read once per update()
cycle and passed as a parameter.
Audited all other hyundai CAN-FD integration code for similar patterns:
- carstate.py — already fixed (previous commit)
- carcontroller.py — other Params references are all in commented-out code
- hyundaicanfd.py::create_buttons_alt — dead code (early return), so the
Params read there never executes; left as-is
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Speed limit ≤25 mph now allows +8 over before warning (e.g. limit 25 →
33 is ok, warn at 34). Existing tiers unchanged: ≥50 → +9 ok warn at +10,
26-49 → +6 ok warn at +7.
Also gate all 7 speed_logic param writes on change (same pattern as the
earlier carstate/controlsd perf fix). Called at 2Hz so not as hot, but
unit/is_metric never change mid-drive and the cruise warning rarely
flips — no reason to thrash /dev/shm on every update.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Python Params uses snake_case; the C++ camelCase call raised
AttributeError, killing thermald_thread at the exact moment of shutdown.
Result: DoShutdown never got set, the 10-minute timer "worked" once (set
DashcamShutdown=True) and then thermald died silently. Device kept
draining the battery instead of powering down.
Caught because CLAUDE.md specifically flags this pattern as a common
source of silent failures between C++ and Python.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tucson's radar-based collision warning is more reliable than the comma
model/planner FCW and was producing false positives. Single-user fork in
a single car, so no need to keep both.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Captures how we found the hot Params writes in carstate.py so the same
technique can be repeated. Includes the awk aggregators for per-callsite
and per-file-line breakdowns.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
py-spy showed per-cycle atomic param writes were the dominant cost. Each
put() is mkstemp+fsync+flock+rename+fsync_dir — fine when rare, ruinous at
100Hz. At park with no state changes, these writes were running anyway and
the flock contention was poisoning the whole system.
carstate.py (update + update_canfd): CarSpeedLimit, CarIsMetric,
CarCruiseDisplayActual were written every CAN update. Now cached and
written only on change.
controlsd.py: same fix for LatRequested and no_lat_lane_change. Also
throttle the sentry crash-file stat() from 100Hz to 1Hz.
Also: suppress locationdTemporaryError/paramsdTemporaryError/posenetInvalid
on lat engage (same 2s window as commIssue), and tie suppression to the
LatRequested edge instead of CC.latActive (fires immediately, not after
the 250ms ramp-up delay).
Also: reset Ratekeeper when it falls >1s behind — the ~6s fingerprinting
stall at startup was poisoning the lag metric for the entire session.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Decouples "tell modeld to go fast" from "steering actually active":
- New LatRequested memory param — controlsd writes when lat would be active
- modeld reads LatRequested (not carControl.latActive) for FPS decision,
so it switches to 20fps immediately on engage request
- controlsd delays CC.latActive becoming true by 250ms (5 frames @ 20fps)
after LatRequested goes true, giving downstream services
(longitudinalPlan, liveCalibration, etc.) time to stabilize at the new rate
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fan controller: allow full 100% fan when offroad temp >= 75°C (startup cooling)
- ModelFps memory param: modeld publishes actual FPS (20 or 4) so downstream
consumers can adjust frame-rate-dependent logic
- Longitudinal planner: dynamically adjusts dt and v_desired_filter based on
ModelFps; FCW crash_cnt threshold scales with FPS to maintain consistent
0.15s trigger window at both 20fps and 4fps
- controlsd: suppress commIssue alerts for 2s after lateral control engages
(FPS transition from 4->20 causes transient freq check failures)
- Shutdown timer: hardcoded to 10 minutes (was 45min via FrogPilot param),
screen taps reset the countdown via ShutdownTouchReset memory param,
removed Shutdown Timer UI selector from ClearPilot menu
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
dashcamd now waits for valid system time + GPS fix + drive gear before
starting a trip. Returns to waiting state on 10-min park timeout or
ignition off. Publishes DashcamState and per-trip DashcamFrames to
memory params. Status window shows stopped/waiting/recording states.
Updated CLAUDE.md with current display mode behavior, OmxEncoder port
details, speed limit warning thresholds, and dashcam param docs.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- OMX_Init/OMX_Deinit managed per encoder instance lifecycle
- Proper error handling in constructor, encoder_open, encoder_close
- Null guards on done_out.pop() and handle in destructor
- Codec config written directly to codecpar (no codec_ctx)
- ffmpeg faststart remux on segment close
- Crash handler in dashcamd for diagnostics
- DashcamFrames param for live frame count in status window
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Reset OMX subsystem (Deinit/Init) on dashcamd startup to clear stale
encoder state from previous unclean exits
- Validate OMX output buffers before memcpy to prevent segfault
- Validate VisionBuf frame data before encoding
- Add dashcam row to status window showing recording state and disk usage
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removed nightrider exception that kept onroad UI visible in park.
Shifting to park from nightrider mode now auto-switches to screen off.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Tapping the touchscreen while in display mode 3 (screen off) resets
ScreenDisplayMode to 0 (auto-normal) and wakes the display.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
updateWakefulness was overriding display power every frame when ignition
was on, fighting the screen-off set by home.cc. Now respects
ScreenDisplayMode 3 unconditionally. Also auto-resets to mode 0 when
shifting into drive from screen-off.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Plays a ding via soundd when the cruise warning sign becomes visible
(cruise set speed out of range vs speed limit) or when the speed limit
changes while the warning sign is already showing. Max 1 ding per 30s.
Ding is mixed independently into soundd output at max volume without
interrupting alert sounds. bench_cmd ding available for manual trigger.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Cruise warning sign appears above speed limit sign when cruise set
speed is too far from the speed limit:
- Red (over): cruise >= limit + 10 (if limit >= 50) or + 5 (if < 50)
- Green (under): cruise <= limit - 5
- Only when cruise active (not paused/disabled) and limit >= 20
- Nightrider mode: colored text/border on black background
Speed limit sign enlarged 5%. 20px gap between signs. Bench mode
gains cruiseactive command (0=disabled, 1=active, 2=paused).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New speed_logic.py module converts raw CAN speed limit and GPS speed
into display-ready params. Called from controlsd (live) and
bench_onroad (bench) at ~2Hz. UI reads params to render:
- Current speed (top center, hidden when 0 or no GPS)
- MUTCD speed limit sign (lower-left, normal + nightrider styles)
- Unit-aware display (mph/kph based on CAN DISTANCE_UNIT)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Standstill quiet mode was clamped to 0-10% which is dangerously low
under sustained load. Raised to 0-30%. Bench mode now forces 30-100%
onroad fan range regardless of standstill to prevent overheating
during bench testing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Ensures git push works without GIT_SSH_COMMAND override. Idempotent —
skips if Host entry already exists in /root/.ssh/config.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keys now install to /root/.ssh/ (for root git operations) instead of
/data/ssh/.ssh/. Runs every boot via on_start.sh so keys are available
even without a full provision cycle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Create /usr/local/bin/claude wrapper that remounts / read-write before
calling the real binary. Removes PATH append to ~/.bashrc which was
duplicating on every provision run.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
manage_athenad was calling launcher() with only 2 args but the
per-process logging changes added a required log_path parameter.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Provision script now checks and corrects the git origin URL to the
SSH remote before fetching updates. Also fixed CLAUDE.md to reflect
the correct hostname (git.hanson.xyz, not git.internal.hanson.xyz).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The decrypt step in provision.sh was writing decrypted private keys
directly into the source tree (system/clearpilot/dev/), leaving them
as untracked files in the repo. Now decrypts to a mktemp dir, copies
to the SSH dir, and cleans up. Also added ed25519 key paths to
.gitignore to match the existing id_rsa entries.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Generate new ed25519 keypair (replaces old RSA keys)
- Encrypt with device serial from /proc/cmdline (always available, no manager needed)
- Update decrypt/encrypt tools and provision.sh to use serial
- Remove dependency on DongleId param for SSH key provisioning
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add build_preflight.sh to create obj/ dirs that git can't track (body/board/obj, panda/board/obj)
- Wire preflight into build_only.sh and launch_chffrplus.sh
- Restore missing third_party binaries (libyuv, snpe, acados, maplibre) that were text pointers
- Remove dead Qt5WebEngineWidgets dependency from UI SConscript
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Install ccrypt, nodejs 18, npm, claude code in provision
- Decrypt id_rsa/id_rsa.pub via dongle ID and install to /data/ssh/.ssh/
- Run provision directly instead of through qt_shell wrapper
- Fix panda and body SConscripts to mkdir obj/ before writing gitversion.h
- Add sudo to su - comma build call
- Remount / rw at top of provision
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SSH keys and sshd start immediately on every boot, not gated behind
quick_boot or dongle check. Provision script only handles packages,
git pull, and build.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- on_start.sh runs provision through qt_shell for on-screen display
- provision_wrapper.sh redirects stderr to stdout so errors are visible
- provision.sh: SSH setup before WiFi wait, verbose echo output,
sleep on failure so messages are readable
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- on_start.sh: always enables WiFi, waits 30s for connectivity if
no /data/quick_boot, then runs provision.sh
- New provision.sh: sets up SSH keys, installs openvpn, pulls latest
code from remote (hard reset, remote wins), runs build_only.sh,
touches /data/quick_boot on success
- Delete old dev/on_start.sh, dev/provision.sh, dev/on_start_brian.sh.cpt
(encrypted key decryption no longer needed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- modeld: full 20fps (lat active), 4fps (driving no lateral), 0fps
(standstill). Standby timestamp written in both standby and 4fps modes
to suppress transient errors on engagement transition
- tlog: all calls commented out (was causing 100Hz CPU load in controlsd
and carstate). tlog client now checks TelemetryEnabled param before
sending — zero cost when disabled
- dashcamd: wait for valid frame dimensions on startup (fix SIGSEGV),
always_run (manages own trip lifecycle)
- Fan: driving range 15-100%
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Comment out all tlog() calls in controlsd (100Hz) and carstate (100Hz)
— was causing controlsd to lag from JSON serialization + ZMQ overhead
- tlog() now checks TelemetryEnabled memory param (1/sec file read),
returns immediately when disabled — zero cost when telemetry is off
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- modeld: enter standby when latActive=false (not just standstill),
exception for lane changes (no_lat_lane_change). Fix Python capnp
property access (.latActive not getLatActive())
- controlsd: move model_suppress computation early, suppress radarFault,
posenetInvalid, locationdTemporaryError, paramsdTemporaryError during
model standby + 2s grace period. All cascade from modeld not publishing
- dashcamd: always_run (manages own trip lifecycle), wait for valid frame
dimensions before encoding (fix SIGSEGV on early start)
- Fan: driving range 15-100% (was 30-100%)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Telemetry status bar in onroad UI: temp, fan %, model FPS, standstill
- Fix paramsMemory usage: Params("/dev/shm/params") not "/dev/shm/params/d"
- Telemetry/VPN toggles use ToggleControl with manual paramsMemory writes
- TelemetryEnabled/VpnEnabled registered PERSISTENT, written to memory path
- GPS telemetry: telemetryd subscribes to gpsLocation at 1Hz via cereal
- Nightrider: force CameraWidget bg black to eliminate color bleed border
- Suppress "Always On Lateral active" status bar message
- Re-enable gpsd and dashcamd
- CLAUDE.md: document memory params pattern, speed_limit.calculated usage
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Nightrider: lines 1px wider (3px outline), engagement border hidden,
planner lines hidden when disengaged, stay on onroad view in park
- Normal mode only: return to ready splash on park
- Ready text sprite at native 1x size
- Nice monitor: keeps claude processes at nice 19, runs every 30s
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>