fix: clarify code queue split-brain liveness

This commit is contained in:
Codex
2026-05-20 01:41:16 +00:00
parent b01907739c
commit ada6da3da6
11 changed files with 264 additions and 10 deletions
+1 -1
View File
@@ -50,7 +50,7 @@ Use:
- `bun scripts/cli.ts codex task <taskId> --trace --limit N` or `codex output` only when the summary is insufficient.
- The liveness rules in `docs/reference/observability.md` when master control-plane state and D601 scheduler state appear split.
`split-brain` in queue diagnostics is a control-plane/execution-plane divergence signal, not automatic evidence that the work is dead. If the task heartbeats are fresh and the trace is still advancing, treat the task as live and keep supervising it rather than interrupting or replacing it.
`split-brain` in queue diagnostics is a control-plane/execution-plane divergence signal, not automatic evidence that the work is dead. If the task heartbeats are fresh and the trace is still advancing, treat the task as live and keep supervising it rather than interrupting or replacing it. The queue summary should expose this as `effectiveLiveness=live`, `splitBrainLive=true`, and `recommendedAction=continue-supervision`; expired, missing, or stale-recovery heartbeat evidence should instead surface `effectiveLiveness=at-risk`.
Long-running tasks with fresh trace or heartbeat evidence should normally be left alone. Polling every few minutes is preferred over repeated interrupt/retry cycles.