diff --git a/docs/reference/code-queue-supervision.md b/docs/reference/code-queue-supervision.md
index 10e3cf5a..ff64a0cd 100644
--- a/docs/reference/code-queue-supervision.md
+++ b/docs/reference/code-queue-supervision.md
@@ -163,7 +163,7 @@ bun scripts/cli.ts codex pr-preflight --remote --issue 20
`codex pr-preflight --remote` 的 `auth-missing` 只表示 scheduler/runtime preflight surface(`scheduler-runner-env`)没有看到 `GH_TOKEN/GITHUB_TOKEN` 或 auth-broker,不得被简化成“当前 active runner/dev container 不能创建 PR”。Code Queue 输出必须同时给出 `scopeBoundary` 和 `activeRunnerDevContainer`:前者说明 scheduler env 与当前 CLI/dev container 是独立 scope,后者只报告当前 CLI 进程是否看见 token,且不打印 token 值。指挥官看到 remote preflight `auth-missing` 时,应继续用当前 runner 内的 `bun scripts/cli.ts gh auth status --repo pikasTech/unidesk`、`gh pr create --dry-run`、`gh pr comment create --dry-run` 验证实际 PR 能力;只有这些 active runner 检查也失败时,才能把它判成当前 turn 不能 PR。
-该命令经 backend-core 稳定 `code-queue` proxy 访问 D601 scheduler 的 `/api/runtime-preflight`,报告 scheduler/runner 环境里的 `GH_TOKEN`/`GITHUB_TOKEN` 覆盖、工具、Git worktree、GitHub egress、repo/issue/PR 只读探测和可选 push dry-run。需要复核 PR body/创建命令 guard 时追加 `--pr-create-dry-run --pr-create-dry-run-head
`;该 guard 只执行 dry-run,不创建 PR。缺少 env token 时必须返回 `ok=false`、`runnerDisposition=infra-blocked`、`tokenCoverage.missing=["GH_TOKEN","GITHUB_TOKEN"]` 和 `authBroker.source="broker/auth-broker-needed"`,因为 provider dev container 只能转发 scheduler 已经拥有的 token,除非后续接入 broker-held GitHub credential。系统 `gh` binary 缺失只能作为 `tools.systemGhBinary.ok=false` 观测,不得把它误判为 UniDesk REST `bun scripts/cli.ts gh` 不可用。`--remote` 在 runner-like 环境里不再要求本地 `unidesk-backend-core`、`unidesk-database`、`baidu-netdisk-backend` 容器存在;这些本地 target stack 缺失只作为证据,不是最终主阻塞。若远程控制面可达,输出继续保留 ready preflight;若远程控制面不可达,结构化失败归类为 `failureKind=control-plane-missing` / `degradedReason=remote-control-plane-unreachable`。输出中的 `prCapabilityContract` 用于指挥官快速审查 runner handoff:目标分支固定显示、push/PR create dry-run 标记为不写远端、系统 `gh` binary 与 UniDesk REST `bun scripts/cli.ts gh` 可用性分开报告,且 merge 明确保持 `unsupported-command`。
+该命令经 backend-core 稳定 `code-queue` proxy 访问 D601 scheduler 的 `/api/runtime-preflight`,报告 scheduler/runner 环境里的 `GH_TOKEN`/`GITHUB_TOKEN` 覆盖、工具、Git worktree、GitHub egress、repo/issue/PR 只读探测和可选 push dry-run。需要复核 PR body/创建命令 guard 时追加 `--pr-create-dry-run --pr-create-dry-run-head `;该 guard 只执行 dry-run,不创建 PR。缺少 env token 时必须返回 `ok=false`、`runnerDisposition=infra-blocked`、`tokenCoverage.missing=["GH_TOKEN","GITHUB_TOKEN"]` 和 `authBroker.source="broker/auth-broker-needed"`,因为 provider dev container 只能转发 scheduler 已经拥有的 token,除非后续接入 broker-held GitHub credential。系统 `gh` binary 缺失只能作为 `tools.systemGhBinary.ok=false` 观测,不得把它误判为 UniDesk REST `bun scripts/cli.ts gh` 不可用。`--remote` 在 runner-like 环境里不再要求本地 `unidesk-backend-core`、`unidesk-database`、`baidu-netdisk-backend` 容器存在;这些本地 target stack 缺失只作为证据,不是最终主阻塞,并应额外标成 `blockingDisposition=runner-local-observation-gap` 或 `localObservationGap.kind=runner-local-observation-gap`。若远程控制面可达,输出继续保留 ready preflight;若远程控制面不可达,结构化失败归类为 `failureKind=control-plane-missing` / `degradedReason=remote-control-plane-unreachable`,并额外标成 `blockingDisposition=control-plane-observation-gap`。`runnerDisposition` 可以为了旧调用方兼容继续保持 `infra-blocked`,但 observation-gap 字段才是判断“观测路径缺口,不是 scheduler 停摆”的稳定口径。输出中的 `prCapabilityContract` 用于指挥官快速审查 runner handoff:目标分支固定显示、push/PR create dry-run 标记为不写远端、系统 `gh` binary 与 UniDesk REST `bun scripts/cli.ts gh` 可用性分开报告,且 merge 明确保持 `unsupported-command`。
本地 runner preflight 示例:
@@ -215,7 +215,7 @@ bun scripts/cli.ts codex pr-preflight --remote --issue
完成未读任务的审阅也必须遵循渐进披露。指挥官默认只拉取原始 prompt 和最终 response,用它判断任务是否声称完成、是否有明显越界、是否缺少验收证据;不要默认拉完整 trace、全量 tool summary 或 raw output。只有当 final response 与目标不一致、证据不足、远端 commit 无法验证、任务疑似造假、或需要追溯失败原因时,才继续展开 `--detail`、分页 `--trace`、或按 seq 读取 `codex output`。这条规则的目标是降低上下文压力,同时保留通过多步查询拿到完整证据的能力。
-队列诊断中的 `split-brain` 表示控制面/执行面观测分裂,不自动证明任务已经死亡。只要任务 heartbeat 还在刷新、trace 仍在推进,就不能把它判成服务中断或要求立刻 stop;应把它视为 `splitBrainLive=true` 的 live 任务,继续监督并推进 #20 里的已排任务,而不是 interrupt、替换或把 backend 当成已经挂掉。队列摘要应显示 `effectiveLiveness=live`、`splitBrainLive=true` 和 `recommendedAction=continue-supervision`;只有 heartbeat expired/missing 或满足 stale-recovery 条件时,才应显示 `effectiveLiveness=at-risk` 并进入恢复判断。
+队列诊断中的 `split-brain` 表示控制面/执行面观测分裂,不自动证明任务已经死亡。只要任务 heartbeat 还在刷新、trace 仍在推进,就不能把它判成服务中断或要求立刻 stop;应把它视为 `splitBrainLive=true` 的 live 任务,继续监督并推进 #20 里的已排任务,而不是 interrupt、替换或把 backend 当成已经挂掉。队列摘要应显示 `effectiveLiveness=live`、`splitBrainLive=true` 和 `recommendedAction=continue-supervision`;compact 输出还应在 `executionDiagnostics.liveness` 中重复这些低噪声字段,并突出 `activeHeartbeatCount`、有界 `heartbeatFreshTaskIds`、`databaseActiveTaskCount` 和 `schedulerActiveRunSlotCount`。当 master/control-plane 的 `schedulerActiveRunSlotCount=0` 但 `heartbeatFreshTaskIds` 非空时,active 数应优先按 scheduler heartbeat 摘要解释为 live,而不是按 master 本地 slot 0 解释为执行停摆。只有 heartbeat expired/missing 或满足 stale-recovery 条件时,才应显示 `effectiveLiveness=at-risk` 并进入恢复判断。
单次 `provider is not online`、SSH 超时、proxy 超时或 registry 请求失败只能证明“当前观察路径失败”,不能单独升级为 D601 全局离线、CI/CD 全局阻塞或业务任务不可推进。指挥官和 runner 必须用多信号裁决运行面状态,至少区分以下观察面:
@@ -225,7 +225,7 @@ bun scripts/cli.ts codex pr-preflight --remote --issue
- Code Queue scheduler heartbeat、任务 heartbeat、trace/output 是否持续入库;
- 当前 runner 容器内 CLI/proxy 路径是否只是局部不可达。
-只有多个独立观察面同时失败,或同一关键路径在明确时间窗口内持续失败,才能把问题判为全局阻塞。否则应记录为 transient 或 runner-local observation gap,优先重试、steer 任务纠偏或拆出基础设施 follow-up;不得让业务 worker 把单次局部失败作为最终 blocker。CLI 和 runtime 必须把错误输出结构化为 `scope=runner-local|provider-gateway|ssh|registry|k3s|scheduler|service-proxy`、`observedAt`、`retryable`、`decision`、`healthyScopes`、`failedScopes` 和建议的交叉验证命令。
+只有多个独立观察面同时失败,或同一关键路径在明确时间窗口内持续失败,才能把问题判为全局阻塞。否则应记录为 transient、`runner-local-observation-gap` 或 `control-plane-observation-gap`,优先重试、steer 任务纠偏或拆出基础设施 follow-up;不得让业务 worker 把单次局部失败作为最终 blocker。CLI 和 runtime 必须把错误输出结构化为 `scope=runner-local|control-plane|provider-gateway|ssh|registry|k3s|scheduler|service-proxy`、`observedAt`、`retryable`、`decision` 或 `blockingDisposition`、`healthyScopes`、`failedScopes` 和建议的交叉验证命令。当前 runner/local backend-core 容器缺失属于 runner-local observation gap;远程控制面也不可达属于 control-plane observation gap;两者都不能单独写成 active runner 数归零或 scheduler 停摆。
ClaudeQQ 是面向用户的主动提醒通道,不是 #24 简报更新的自动转发器。指挥官只应在三类情况下自主发送 ClaudeQQ 消息:核心服务或关键执行面宕机且需要用户知情,高风险决策需要用户请示,或出现里程碑式进展值得同步。消息必须简明扼要,一次不超过 200 个中文字符,写成一段话,不使用 Markdown 语法。普通轮询、普通 issue 更新、普通 #24 简报追加、外部 token provider 正常限流、以及无用户动作要求的中间状态,不发送 ClaudeQQ。发送失败只记录到 #24 或对应 blocker issue,不回滚已经完成的 GitHub issue 更新。
diff --git a/scripts/code-queue-pr-preflight-contract-test.ts b/scripts/code-queue-pr-preflight-contract-test.ts
index 7e8f4068..515dc08a 100644
--- a/scripts/code-queue-pr-preflight-contract-test.ts
+++ b/scripts/code-queue-pr-preflight-contract-test.ts
@@ -339,6 +339,9 @@ async function main(): Promise {
assertCondition(fallback.runnerDisposition === "ready", "remote fallback should stay ready", fallback);
assertCondition(fallback.controlPlane && asRecord(fallback.controlPlane).remoteFallbackUsed === true, "remote fallback should be marked", fallback.controlPlane);
assertCondition(fallback.failureKind === null, "remote fallback should not invent a failure kind when remote control plane is healthy", fallback);
+ const fallbackLocalGap = asRecord(fallback.localObservationGap);
+ assertCondition(fallbackLocalGap.kind === "runner-local-observation-gap", "healthy remote fallback should classify local backend-core absence as runner-local observation gap", fallbackLocalGap);
+ assertCondition(fallbackLocalGap.schedulerStoppage === false, "local observation gap must not imply scheduler stoppage", fallbackLocalGap);
const fallbackPreflight = asRecord(fallback.preflight);
assertCondition(fallbackPreflight.ok === true, "remote fallback preflight should stay ready", fallbackPreflight);
assertCondition(asRecord(fallbackPreflight.tokenCoverage).source === "GH_TOKEN", "token source should be GH_TOKEN", fallbackPreflight.tokenCoverage);
@@ -352,6 +355,11 @@ async function main(): Promise {
assertCondition(remoteControlPlaneMissingRecord.ok === false, "missing control plane should fail", remoteControlPlaneMissingRecord);
assertCondition(remoteControlPlaneMissingRecord.failureKind === "control-plane-missing", "missing control plane should classify as control-plane-missing", remoteControlPlaneMissingRecord);
assertCondition(remoteControlPlaneMissingRecord.degradedReason === "remote-control-plane-unreachable", "missing control plane should classify as remote-control-plane-unreachable", remoteControlPlaneMissingRecord);
+ assertCondition(remoteControlPlaneMissingRecord.runnerDisposition === "infra-blocked", "missing remote control plane keeps legacy runnerDisposition compatibility", remoteControlPlaneMissingRecord);
+ assertCondition(remoteControlPlaneMissingRecord.blockingDisposition === "control-plane-observation-gap", "missing remote control plane should expose observation gap blocking disposition", remoteControlPlaneMissingRecord);
+ const remoteControlPlaneGap = asRecord(remoteControlPlaneMissingRecord.observationGap);
+ assertCondition(remoteControlPlaneGap.kind === "control-plane-observation-gap", "missing remote control plane should expose control-plane observation gap", remoteControlPlaneGap);
+ assertCondition(remoteControlPlaneGap.schedulerStoppage === false, "control-plane observation gap must not imply scheduler stoppage", remoteControlPlaneGap);
assertCondition(asRecord(remoteControlPlaneMissingRecord.controlPlane).localBackendCoreMissing === true, "local backend-core absence should remain evidence only", remoteControlPlaneMissingRecord.controlPlane);
const directAuthMissing = await codexPrPreflightQueryForTest(["--remote"], {
@@ -389,6 +397,8 @@ async function main(): Promise {
assertCondition(directAuthMissingRecord.ok === false, "auth-missing remote result should fail", directAuthMissingRecord);
assertCondition(directAuthMissingRecord.failureKind === "auth-missing", "missing token should classify as auth-missing", directAuthMissingRecord);
assertCondition(directAuthMissingRecord.degradedReason === "GH_TOKEN/GITHUB_TOKEN missing", "auth missing should state token gap", directAuthMissingRecord);
+ const directAuthObservationGap = asRecord(directAuthMissingRecord.observationGap);
+ assertCondition(directAuthObservationGap.kind === "runner-local-observation-gap", "auth missing after remote fallback should keep local backend-core absence scoped as runner-local observation gap", directAuthObservationGap);
const directAuthScopeBoundary = asRecord(directAuthMissingRecord.scopeBoundary);
const directAuthActiveRunner = asRecord(directAuthMissingRecord.activeRunnerDevContainer);
assertCondition(directAuthScopeBoundary.scopesAreIndependent === true, "remote auth-missing must distinguish scheduler env from active runner dev container", directAuthScopeBoundary);
@@ -405,6 +415,19 @@ async function main(): Promise {
const gitRemoteGapRecord = asRecord(gitRemoteGap);
assertCondition(gitRemoteGapRecord.failureKind === "git-remote-gap", "git probe failures should stay structured", gitRemoteGapRecord);
+ const localOnlyObservationGap = await codexPrPreflightQueryForTest([], {
+ config: null,
+ coreFetch: () => localBackendCoreMissingFixture(),
+ });
+ const localOnlyObservationGapRecord = asRecord(localOnlyObservationGap);
+ assertCondition(localOnlyObservationGapRecord.ok === false, "local-only backend-core absence should fail the preflight", localOnlyObservationGapRecord);
+ assertCondition(localOnlyObservationGapRecord.failureKind === "target-stack-not-running", "local-only backend-core absence should preserve target-stack evidence", localOnlyObservationGapRecord);
+ assertCondition(localOnlyObservationGapRecord.runnerDisposition === "infra-blocked", "local-only backend-core absence keeps legacy runnerDisposition compatibility", localOnlyObservationGapRecord);
+ assertCondition(localOnlyObservationGapRecord.blockingDisposition === "runner-local-observation-gap", "local-only backend-core absence should expose runner-local blocking disposition", localOnlyObservationGapRecord);
+ const localOnlyGap = asRecord(localOnlyObservationGapRecord.observationGap);
+ assertCondition(localOnlyGap.kind === "runner-local-observation-gap", "local-only backend-core absence should include observationGap detail", localOnlyGap);
+ assertCondition(localOnlyGap.schedulerStoppage === false, "local-only backend-core absence must not imply scheduler stoppage", localOnlyGap);
+
const proxyGap = await codexPrPreflightQueryForTest(["--remote"], {
config: null,
coreFetch: () => ({
@@ -741,7 +764,8 @@ async function main(): Promise {
checks: [
"runner-like local target-stack absence does not block remote fallback",
"remote control plane fallback preserves ready preflight",
- "missing remote control plane returns control-plane-missing",
+ "missing remote control plane returns control-plane-observation-gap",
+ "local backend-core absence returns runner-local-observation-gap",
"auth missing returns auth-missing with broker/auth-broker-needed",
"proxy failures return proxy-gap",
"git remote failures return git-remote-gap",
diff --git a/scripts/code-queue-queues-shape-contract-test.ts b/scripts/code-queue-queues-shape-contract-test.ts
index 6cf7bf5c..b8f062e5 100644
--- a/scripts/code-queue-queues-shape-contract-test.ts
+++ b/scripts/code-queue-queues-shape-contract-test.ts
@@ -36,8 +36,8 @@ function fixtureResponse(): JsonRecord {
heartbeatFreshTaskIds: ["task-running"],
databaseActiveTaskCount: 1,
databaseActiveTaskIds: ["task-running"],
- schedulerActiveRunSlotCount: 1,
- schedulerActiveTaskIds: ["task-running"],
+ schedulerActiveRunSlotCount: 0,
+ schedulerActiveTaskIds: [],
},
},
queues: [
@@ -93,6 +93,13 @@ function assertQueuesShape(label: string, result: unknown, expectedView: string)
assertCondition(diagnostics.splitBrainLive === true, `${label} split-brain live should remain explicitly true`, diagnostics);
assertCondition(diagnostics.effectiveLiveness === "live", `${label} diagnostics should retain derived liveness`, diagnostics);
assertCondition(diagnostics.recommendedAction === "continue-supervision", `${label} split-brain live should continue supervision`, diagnostics);
+ const liveness = asRecord(diagnostics.liveness);
+ assertCondition(liveness.effectiveLiveness === "live", `${label} liveness summary should foreground effective live state`, liveness);
+ assertCondition(liveness.recommendedAction === "continue-supervision", `${label} liveness summary should foreground recommended action`, liveness);
+ assertCondition(liveness.activeHeartbeatCount === 1, `${label} liveness summary should derive active heartbeat count from fresh heartbeat ids`, liveness);
+ assertCondition(liveness.schedulerActiveRunSlotCount === 0, `${label} liveness summary should keep master active slot zero visible`, liveness);
+ assertCondition(asArray(liveness.heartbeatFreshTaskIds).length === 1, `${label} liveness summary should include bounded fresh heartbeat task ids`, liveness);
+ assertCondition(String(liveness.interpretation ?? "").includes("heartbeat is fresh"), `${label} liveness interpretation should explain slot-zero split-brain`, liveness);
assertCondition(Array.isArray(queues.activeTaskIds), `${label} activeTaskIds should be present`, queues);
assertCondition(Array.isArray(queues.queuedTaskIds), `${label} queuedTaskIds should be present`, queues);
}
diff --git a/scripts/code-queue-supervisor-disclosure-contract-test.ts b/scripts/code-queue-supervisor-disclosure-contract-test.ts
index 7a283ced..eaec8f32 100644
--- a/scripts/code-queue-supervisor-disclosure-contract-test.ts
+++ b/scripts/code-queue-supervisor-disclosure-contract-test.ts
@@ -157,6 +157,7 @@ export function runCodeQueueSupervisorDisclosureContract(): JsonRecord {
const completedUnread = asRecord(supervisorView.completedUnread);
const fullTasks = asRecord(asRecord(full).tasks);
const diagnostics = asRecord(supervisorView.executionDiagnostics);
+ const liveness = asRecord(diagnostics.liveness);
const listBudget = asRecord(diagnostics.listBudget);
const omittedCounts = asRecord(listBudget.omittedCounts);
@@ -167,6 +168,12 @@ export function runCodeQueueSupervisorDisclosureContract(): JsonRecord {
assertCondition(recentItems.every((item) => asRecord(item).unreadTerminal === false), "recentCompleted should not duplicate unread terminal tasks", { recentItems });
assertCondition(asArray(diagnostics.databaseActiveTaskIds).length === 12, "diagnostic task id lists should be capped", diagnostics);
assertCondition(omittedCounts.databaseActiveTaskIds === 68, "diagnostic omitted counts should preserve full visibility metadata", omittedCounts);
+ assertCondition(liveness.effectiveLiveness === "live", "supervisor liveness summary should keep split-brain live explicit", liveness);
+ assertCondition(liveness.recommendedAction === "continue-supervision", "supervisor liveness summary should recommend continued supervision", liveness);
+ assertCondition(liveness.splitBrainLive === true, "supervisor liveness summary should mark splitBrainLive", liveness);
+ assertCondition(liveness.activeHeartbeatCount === 80, "supervisor liveness summary should foreground active heartbeat count", liveness);
+ assertCondition(asArray(liveness.heartbeatFreshTaskIds).length === 12, "supervisor liveness summary should keep heartbeatFreshTaskIds bounded", liveness);
+ assertCondition(String(liveness.interpretation ?? "").includes("continue supervision"), "supervisor liveness interpretation should not imply scheduler stoppage", liveness);
assertCondition(asArray(diagnostics.reasons).length === 6, "diagnostic reasons should be capped", diagnostics);
assertCondition(diagnostics.livenessSummaryTruncated === true, "long diagnostic liveness summary should be previewed", diagnostics);
assertCondition(listBudget.truncated === true && typeof listBudget.rawCommand === "string", "diagnostic list budget should disclose raw command", listBudget);
diff --git a/scripts/src/code-queue.ts b/scripts/src/code-queue.ts
index 5df4e156..6e04e402 100644
--- a/scripts/src/code-queue.ts
+++ b/scripts/src/code-queue.ts
@@ -260,6 +260,7 @@ interface CodexPrPreflightOptions {
}
type CodeQueuePrPreflightFailureKind = "auth-missing" | "proxy-gap" | "git-remote-gap" | "control-plane-missing" | "target-stack-not-running";
+type CodeQueueObservationGapKind = "runner-local-observation-gap" | "control-plane-observation-gap" | null;
interface CodeQueuePrPreflightTransport {
config?: UniDeskConfig | null;
@@ -1015,6 +1016,42 @@ function recommendedActionFromDiagnostics(record: Record): stri
return "none";
}
+function activeHeartbeatCountFromDiagnostics(record: Record, activeHeartbeatTaskIds: { count: number }, heartbeatFreshTaskIds: { count: number }): number {
+ const explicit = asNumber(record.activeHeartbeatCount, Number.NaN);
+ return Number.isFinite(explicit) ? explicit : Math.max(activeHeartbeatTaskIds.count, heartbeatFreshTaskIds.count);
+}
+
+function compactLivenessDecision(record: Record, lists: {
+ activeHeartbeatTaskIds: { items: string[]; count: number; truncated: boolean; omitted: number };
+ heartbeatFreshTaskIds: { items: string[]; count: number; truncated: boolean; omitted: number };
+ heartbeatRiskTaskIds: { items: string[]; count: number };
+ databaseActiveTaskIds: { count: number };
+}): Record {
+ const splitBrainLive = splitBrainLiveFromDiagnostics(record);
+ const effectiveLiveness = effectiveLivenessFromDiagnostics(record);
+ const recommendedAction = recommendedActionFromDiagnostics(record);
+ const activeHeartbeatCount = activeHeartbeatCountFromDiagnostics(record, lists.activeHeartbeatTaskIds, lists.heartbeatFreshTaskIds);
+ return {
+ effectiveLiveness,
+ recommendedAction,
+ splitBrainLive,
+ activeHeartbeatCount,
+ heartbeatFreshTaskCount: lists.heartbeatFreshTaskIds.count,
+ heartbeatFreshTaskIds: lists.heartbeatFreshTaskIds.items,
+ heartbeatFreshTaskIdsTruncated: lists.heartbeatFreshTaskIds.truncated,
+ databaseActiveTaskCount: asNumber(record.databaseActiveTaskCount, lists.databaseActiveTaskIds.count),
+ schedulerActiveRunSlotCount: record.schedulerActiveRunSlotCount ?? null,
+ heartbeatRiskTaskCount: lists.heartbeatRiskTaskIds.count,
+ interpretation: splitBrainLive
+ ? "scheduler heartbeat is fresh; treat active task count from heartbeat as live and continue supervision"
+ : effectiveLiveness === "at-risk"
+ ? "heartbeat risk is present; investigate heartbeat freshness before recovery"
+ : effectiveLiveness === "degraded"
+ ? "diagnostics are degraded; cross-check heartbeat, trace and control-plane sources"
+ : "diagnostics indicate healthy liveness",
+ };
+}
+
function boundedUniqueStringList(value: unknown, limit = diagnosticsIdPreviewLimit): { items: string[]; count: number; omitted: number; truncated: boolean } {
const all = Array.from(new Set(stringList(value))).sort();
const items = all.slice(0, limit);
@@ -1059,6 +1096,12 @@ function compactExecutionDiagnostics(value: unknown): Record |
const allReasons = stringList(record.reasons);
const reasons = allReasons.slice(0, diagnosticsReasonPreviewLimit).map((reason) => boundedInlineString(reason, 240).text).filter((reason): reason is string => reason !== null);
const livenessSummary = boundedInlineString(record.livenessSummary, 420);
+ const liveness = compactLivenessDecision(record, {
+ activeHeartbeatTaskIds,
+ heartbeatFreshTaskIds,
+ heartbeatRiskTaskIds,
+ databaseActiveTaskIds,
+ });
const omittedCounts = {
databaseActiveTaskIds: databaseActiveTaskIds.omitted,
schedulerActiveTaskIds: schedulerActiveTaskIds.omitted,
@@ -1078,8 +1121,9 @@ function compactExecutionDiagnostics(value: unknown): Record |
degraded: record.degraded ?? null,
splitBrain: record.splitBrain ?? null,
splitBrainLive: splitBrainLiveFromDiagnostics(record),
- effectiveLiveness: effectiveLivenessFromDiagnostics({ ...record, heartbeatRiskTaskIds: fullHeartbeatRiskTaskIds }),
- recommendedAction: recommendedActionFromDiagnostics({ ...record, heartbeatRiskTaskIds: fullHeartbeatRiskTaskIds }),
+ effectiveLiveness: liveness.effectiveLiveness,
+ recommendedAction: liveness.recommendedAction,
+ liveness,
livenessSummary: livenessSummary.text,
livenessSummaryChars: livenessSummary.chars,
livenessSummaryTruncated: livenessSummary.truncated,
@@ -1089,7 +1133,7 @@ function compactExecutionDiagnostics(value: unknown): Record |
databaseActiveTaskIds: databaseActiveTaskIds.items,
schedulerActiveRunSlotCount: record.schedulerActiveRunSlotCount ?? null,
schedulerActiveTaskIds: schedulerActiveTaskIds.items,
- activeHeartbeatCount: record.activeHeartbeatCount ?? activeHeartbeatTaskIds.count,
+ activeHeartbeatCount: liveness.activeHeartbeatCount,
activeHeartbeatTaskIds: activeHeartbeatTaskIds.items,
heartbeatFreshTaskIds: heartbeatFreshTaskIds.items,
heartbeatExpiredTaskIds: heartbeatExpiredTaskIds.items,
@@ -2875,6 +2919,30 @@ function decoratePrPreflightScopeBoundary(record: Record): Reco
};
}
+function prPreflightObservationGap(kind: Exclude, detail: {
+ reason: string;
+ localBackendCoreMissing?: boolean;
+ remoteFallbackUsed?: boolean;
+ remoteControlPlaneReachable?: boolean | null;
+}): Record {
+ const controlPlaneGap = kind === "control-plane-observation-gap";
+ return {
+ kind,
+ blockingDisposition: kind,
+ scope: controlPlaneGap ? "control-plane" : "runner-local",
+ reason: detail.reason,
+ localBackendCoreMissing: detail.localBackendCoreMissing === true,
+ remoteFallbackUsed: detail.remoteFallbackUsed === true,
+ remoteControlPlaneReachable: detail.remoteControlPlaneReachable ?? null,
+ schedulerStoppage: false,
+ schedulerStateMachineChanged: false,
+ recommendedAction: controlPlaneGap ? "cross-check-control-plane" : "retry-from-control-plane-or-remote-fallback",
+ note: controlPlaneGap
+ ? "The control-plane observation path is unavailable; this is not evidence that the scheduler stopped executing active tasks."
+ : "The current runner/local backend-core observation path is unavailable; this is not evidence that active Code Queue execution stopped.",
+ };
+}
+
function compactPrRuntimePreflight(preflight: Record, options: CodexPrPreflightOptions): Record {
const pull = asRecord(preflight.pullRequestDelivery) ?? {};
const tools = asRecord(pull.tools) ?? {};
@@ -3056,9 +3124,16 @@ function queryRemoteMainServerPrPreflight(optionArgs: string[], config: UniDeskC
return {
ok: false,
runnerDisposition: "infra-blocked",
+ blockingDisposition: "control-plane-observation-gap",
failureKind: "control-plane-missing",
degradedReason: "remote-control-plane-unreachable",
message,
+ observationGap: prPreflightObservationGap("control-plane-observation-gap", {
+ reason: "remote control plane CLI returned a structured error",
+ localBackendCoreMissing: false,
+ remoteFallbackUsed: true,
+ remoteControlPlaneReachable: false,
+ }),
controlPlane: {
mode: "remote-frontend",
host: config.network.publicHost,
@@ -3081,9 +3156,16 @@ function queryRemoteMainServerPrPreflight(optionArgs: string[], config: UniDeskC
return {
ok: false,
runnerDisposition: "infra-blocked",
+ blockingDisposition: "control-plane-observation-gap",
failureKind: "control-plane-missing",
degradedReason: "remote-control-plane-unreachable",
message,
+ observationGap: prPreflightObservationGap("control-plane-observation-gap", {
+ reason: "remote control plane CLI could not be reached or returned non-JSON output",
+ localBackendCoreMissing: false,
+ remoteFallbackUsed: true,
+ remoteControlPlaneReachable: false,
+ }),
controlPlane: {
mode: "remote-frontend",
host: config.network.publicHost,
@@ -3128,6 +3210,18 @@ function codeQueuePrPreflight(optionArgs: string[] = [], transport: CodeQueuePrP
if (remoteRecord.ok === false) {
return decoratePrPreflightScopeBoundary({
...remoteRecord,
+ observationGap: prPreflightObservationGap(
+ remoteRecord.failureKind === "control-plane-missing" ? "control-plane-observation-gap" : "runner-local-observation-gap",
+ {
+ reason: remoteRecord.failureKind === "control-plane-missing"
+ ? "remote control plane could not be observed after local backend-core target-stack absence"
+ : "local backend-core target-stack absence was bypassed by remote fallback, but the remote result still failed",
+ localBackendCoreMissing: true,
+ remoteFallbackUsed: true,
+ remoteControlPlaneReachable: remoteRecord.failureKind === "control-plane-missing" ? false : true,
+ },
+ ),
+ blockingDisposition: remoteRecord.failureKind === "control-plane-missing" ? "control-plane-observation-gap" : remoteRecord.blockingDisposition ?? remoteRecord.runnerDisposition ?? "infra-blocked",
controlPlane: {
...(asRecord(remoteRecord.controlPlane) ?? {}),
mode: "remote-frontend",
@@ -3142,6 +3236,12 @@ function codeQueuePrPreflight(optionArgs: string[] = [], transport: CodeQueuePrP
}
return decoratePrPreflightScopeBoundary({
...remoteRecord,
+ localObservationGap: prPreflightObservationGap("runner-local-observation-gap", {
+ reason: "local backend-core target-stack absence was bypassed by healthy remote control-plane fallback",
+ localBackendCoreMissing: true,
+ remoteFallbackUsed: true,
+ remoteControlPlaneReachable: true,
+ }),
controlPlane: {
...(asRecord(remoteRecord.controlPlane) ?? {}),
mode: "remote-frontend",
@@ -3163,9 +3263,16 @@ function codeQueuePrPreflight(optionArgs: string[] = [], transport: CodeQueuePrP
...(localRecord ?? {}),
ok: false,
runnerDisposition: "infra-blocked",
+ blockingDisposition: "control-plane-observation-gap",
failureKind,
degradedReason,
- message: "remote control plane unreachable; local backend-core target-stack absence is evidence only",
+ message: "remote control plane unreachable; local backend-core target-stack absence is observation-gap evidence only",
+ observationGap: prPreflightObservationGap("control-plane-observation-gap", {
+ reason: "local backend-core target stack is missing and no remote control plane could be observed",
+ localBackendCoreMissing: true,
+ remoteFallbackUsed: false,
+ remoteControlPlaneReachable: false,
+ }),
controlPlane: {
mode: "local-backend-core",
localBackendCoreMissing: true,
@@ -3183,9 +3290,20 @@ function codeQueuePrPreflight(optionArgs: string[] = [], transport: CodeQueuePrP
...(localRecord ?? {}),
ok: false,
runnerDisposition: localRecord?.runnerDisposition ?? "infra-blocked",
+ blockingDisposition: localTargetStackMissing ? "runner-local-observation-gap" : localRecord?.blockingDisposition ?? localRecord?.runnerDisposition ?? "infra-blocked",
failureKind: (localRecord?.failureKind as CodeQueuePrPreflightFailureKind | undefined) ?? "proxy-gap",
degradedReason: localRecord?.degradedReason ?? "backend-core-proxy-unavailable",
message: localRecord?.message ?? localRecord?.stderrTail ?? localRecord?.stdoutTail ?? "Code Queue runtime preflight could not be observed",
+ ...(localTargetStackMissing
+ ? {
+ observationGap: prPreflightObservationGap("runner-local-observation-gap", {
+ reason: "local backend-core target stack is missing in this runner observation path",
+ localBackendCoreMissing: true,
+ remoteFallbackUsed: false,
+ remoteControlPlaneReachable: null,
+ }),
+ }
+ : {}),
controlPlane: {
mode: "local-backend-core",
localBackendCoreMissing: localTargetStackMissing,