controlsd+calibrationd: suppress commIssue from valid=False cascade
Two fixes that reinstate the pre-revert defenses against the "TAKE CONTROL IMMEDIATELY / Communication Issue" banner that fires when self-driving on the baseline modelrevert stack: calibrationd: publish valid based on calStatus == calibrated, not sm.all_checks(). Original gate cascaded upstream freq glitches into liveCalibration.valid=False, which kept locationd.filterInitialized False, which fed garbage into paramsd, which corrupted steerRatio (erratic steering). "valid" here is a question about convergence, not input freshness. controlsd: narrow the commIssue trigger to genuine comm failures — not_alive OR can_rcv_timeout. The `not sm.all_checks()` branch also picked up valid=False, but paramsd / torqued / plannerd / frogpilot_planner / dmonitoringd all propagate their sm.all_checks() into msg.valid via a polling-pattern artifact (freq_ok inside poll='...' subscribers tracks gaps between drain bursts rather than the publish rate), so the whole stack flaps valid and trips the banner during normal driving. Content and rate are fine; just the flag. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -485,13 +485,16 @@ class Controls:
|
||||
# generic catch-all. ideally, a more specific event should be added above instead
|
||||
has_disable_events = self.events.contains(ET.NO_ENTRY) and (self.events.contains(ET.SOFT_DISABLE) or self.events.contains(ET.IMMEDIATE_DISABLE))
|
||||
no_system_errors = (not has_disable_events) or (len(self.events) == num_events)
|
||||
if (not self.sm.all_checks() or self.card.can_rcv_timeout) and no_system_errors and not model_suppress:
|
||||
# CLEARPILOT: fire commIssue ONLY when messages actually aren't flowing (not_alive)
|
||||
# or CAN RX is timing out. Don't fire on self-declared valid=False — that's the
|
||||
# polling-pattern / all_checks cascade that paramsd/torqued/plannerd/frogpilot
|
||||
# propagate even while their publish rate and content are fine.
|
||||
comms_really_broken = (not self.sm.all_alive()) or self.card.can_rcv_timeout
|
||||
if comms_really_broken and no_system_errors and not model_suppress:
|
||||
if not self.sm.all_alive():
|
||||
self.events.add(EventName.commIssue)
|
||||
elif not self.sm.all_freq_ok():
|
||||
self.events.add(EventName.commIssueAvgFreq)
|
||||
else: # invalid or can_rcv_timeout.
|
||||
self.events.add(EventName.commIssue)
|
||||
else:
|
||||
self.events.add(EventName.commIssue) # can_rcv_timeout path
|
||||
|
||||
logs = {
|
||||
'invalid': [s for s, valid in self.sm.valid.items() if not valid],
|
||||
|
||||
@@ -284,7 +284,14 @@ def main() -> NoReturn:
|
||||
|
||||
# 4Hz driven by cameraOdometry
|
||||
if sm.frame % 5 == 0:
|
||||
calibrator.send_data(pm, sm.all_checks())
|
||||
# CLEARPILOT: publish valid based on calibration status, not upstream sm.all_checks().
|
||||
# The original gate cascaded upstream freq glitches into liveCalibration.valid=False,
|
||||
# which kept locationd.filterInitialized False, which fed garbage into paramsd, which
|
||||
# corrupted steerRatio and caused erratic steering (and controlsd commIssue banners).
|
||||
# "valid" here semantically means "the calibration data is trustworthy" — a question
|
||||
# about convergence, not input freshness.
|
||||
cal_valid = calibrator.cal_status == log.LiveCalibrationData.Status.calibrated
|
||||
calibrator.send_data(pm, cal_valid)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
|
||||
Reference in New Issue
Block a user