From f80e8d5d2b56c4d6be86f48a5a09d0c8c3bb4367 Mon Sep 17 00:00:00 2001 From: Codex Date: Sat, 23 May 2026 02:49:45 +0000 Subject: [PATCH] fix: compact terminal codex steer rejection --- docs/reference/cli.md | 2 +- docs/reference/code-queue-supervision.md | 2 +- scripts/code-queue-cli-steer-test.ts | 40 ++++++++++++++++++ scripts/src/code-queue.ts | 52 +++++++++++++++++++++++- scripts/src/help.ts | 4 +- 5 files changed, 94 insertions(+), 6 deletions(-) diff --git a/docs/reference/cli.md b/docs/reference/cli.md index e4e0e2a5..c7a9b58c 100644 --- a/docs/reference/cli.md +++ b/docs/reference/cli.md @@ -54,7 +54,7 @@ CLI 可以从 `master` 快速演进,但必须兼容 `deploy.json` 固定的 CI - `codex dev-ready` 查询 Code Queue `/api/dev-ready` 并返回有界 readiness 摘要,包括工具、Docker、Codex config、SSH 和 `devReady.skills`。`devReady.skills` 只暴露 `UNIDESK_SKILLS_PATH`、是否存在、是否只读、skillCount、`cli-spec` 是否可见和修复建议,不输出宿主 auth/token 文件内容。 - `codex judge --attempt N [--dry-run] [--include-prompt]` 通过 Code Queue 私有代理按指定 attempt 单步复现 judge;这是执行面诊断入口,仍依赖 D601 scheduler/runner 侧的真实 judge builder、MiniMax 调用路径和执行环境。默认会真实调用 MiniMax,`--dry-run` 只返回 prompt/payload 大小、attempt 窗口和重建来源诊断,`--include-prompt` 仅用于本地深度排查。 - `codex steer [prompt|--prompt-file path|--prompt-stdin] [--dry-run] [--no-retry|--retry-attempts N]` 通过 Code Queue 私有代理向正在运行的 task 注入纠偏提示,正式替代底层 `microservice proxy code-queue /api/tasks//steer` 调用。prompt 必须且只能来自位置参数、文件或 stdin 之一;`--dry-run` 只输出 `method`、`path`、`stableProxyPath`、retry policy、prompt 字符数、截断预览和 raw proxy 等价命令,不触碰运行中 session,也不得泄露超长 prompt 全文。真实执行是写入操作,成功只返回 `accepted=true`、task id、prompt 字符数、`promptOmitted=true`、有界 task/queue 确认、attempt summary 和后续查看命令,不回显 prompt 或完整 task state;路径固定为 `/api/microservices/code-queue/proxy/api/tasks//steer`,只能作用于 D601 scheduler 上存在 active steerable turn 的 running task。默认对 `stable-proxy-failed` 和 `backend-core-unreachable` 这类 retryable control-plane failures 做一次有界重试;`--retry-attempts N` 最大为 3,`--retry-delay-ms N` 最大为 5000,`--no-retry` 用于复现单次失败。 -- `codex steer` 非 dry-run 失败仍输出 JSON 且退出非零;`.data.diagnostics.reason` 用于 runner 分流,当前包括 `backend-core-unreachable`、`code-queue-microservice-unregistered`、`proxy-unauthorized`、`proxy-404`、`steer-endpoint-404`、`upstream-runtime-rejected`、`stable-proxy-failed` 和 `invalid-proxy-response`。`scope` 区分 `backend-core`、`stable-proxy`、`code-queue-runtime` 或 `unknown`,并带 `status`、`exitCode`、`retryable`、有界 `upstreamBodyPreview`、`attempts`、`retryPolicy` 和推荐交叉验证命令;若任务不在 running/active-turn 状态,通常归类为 `upstream-runtime-rejected`,不得静默成功。`502 provider HTTP tunnel failed`、`provider-gateway-http-fetch`、`The operation was aborted` 或约 30 秒 tunnel wait abort 会归类为 `stable-proxy-failed`,CLI 会先按 retry policy 重试;如果仍失败,`.data.diagnostics.operatorGuidance.rawProxyEquivalentIsFallback=false` 表示 raw proxy 等价命令走同一条 tunnel,只能用于对照诊断,不应被当作更低噪声 fallback。此时 `.data.steer.deliveryUnconfirmed=true`,指挥官应先看 `codex tasks --view supervisor`、`codex task ` 和 `microservice health code-queue`,再从主 server CLI 或显式 SSH transport 重试同一个 `codex steer`。 +- `codex steer` 非 dry-run 失败仍输出 JSON 且退出非零;`.data.diagnostics.reason` 用于 runner 分流,当前包括 `backend-core-unreachable`、`code-queue-microservice-unregistered`、`proxy-unauthorized`、`proxy-404`、`steer-endpoint-404`、`upstream-runtime-rejected`、`stable-proxy-failed` 和 `invalid-proxy-response`。`scope` 区分 `backend-core`、`stable-proxy`、`code-queue-runtime` 或 `unknown`,并带 `status`、`exitCode`、`retryable`、有界 `upstreamBodyPreview`、`attempts`、`retryPolicy` 和推荐交叉验证命令;若任务不在 running/active-turn 状态,通常归类为 `upstream-runtime-rejected`,不得静默成功。`502 provider HTTP tunnel failed`、`provider-gateway-http-fetch`、`The operation was aborted` 或约 30 秒 tunnel wait abort 会归类为 `stable-proxy-failed`,CLI 会先按 retry policy 重试;如果仍失败,`.data.diagnostics.operatorGuidance.rawProxyEquivalentIsFallback=false` 表示 raw proxy 等价命令走同一条 tunnel,只能用于对照诊断,不应被当作更低噪声 fallback。此时 `.data.steer.deliveryUnconfirmed=true`,指挥官应先看 `codex tasks --view supervisor`、`codex task ` 和 `microservice health code-queue`,再从主 server CLI 或显式 SSH transport 重试同一个 `codex steer`。若 D601 返回的 409 已包含 terminal task state,CLI 默认改为紧凑终态响应:`reason=task-already-terminal`、task status、terminal status、`updatedAt`/`finishedAt`、`retryable=false`,并只给出 `codex task `、`codex read ` 和 `codex submit --prompt-file --reference-task-id ` 后续命令,不回显 steer prompt、完整 request body 或大 task object。 - `codex interrupt|cancel ` 通过 Code Queue 私有代理请求中断;running/judging 任务会请求 D601 当前 agent run 停止,queued/retry_wait 任务的取消也必须保持与 WebUI 相同代理路径,返回有界 task 摘要和后续查询命令。任何需要接触 active run 的动作仍属于 D601 执行面。 - Code Queue 多队列 lane 由 `codex` 命令命名空间管理:`queues [--full|--all] [--limit N] [--page N|--offset N]` 列表、`queue create ` 创建、`queue merge --into ` 合并、`move --queue ` 迁移;这些队列管理入口默认由主 server `code-queue-mgr` 直管 PostgreSQL,仍通过稳定 `code-queue` 用户服务代理路径访问。`codex queues` 默认只返回 active/nonempty/unread/runnable queue 摘要、activity、commanderConcurrency、全局 counts 和 execution diagnostics;`--full` 或 `--all` 只切换为完整队列行视图的一页,仍受 `--limit`/`--page`/`--offset` 分页约束,不再默认携带 deprecated full array。summary 和 full 的稳定机读路径都是 `.data.queues.items[]`,全局元数据固定在 `.data.queues.commanderConcurrency`、`.data.queues.activity`、`.data.queues.counts`、`.data.queues.executionDiagnostics`、`.data.queues.activeTaskIds` 和 `.data.queues.queuedTaskIds`;需要完整 upstream 时使用输出中的 raw command。`commanderConcurrency.activeRunnerCount` / `activity.effectiveActiveTaskCount` 是指挥官并发判断的有效活跃数,`schedulerLocalActiveQueueCount`/`activeQueueIds` 只描述本地 scheduler active-run slots,不能覆盖数据库 running 计数或 heartbeat-fresh runner 计数。旧 full 顶层数组语义已作为 deprecated 兼容信息记录,不再作为 `.data.queues` 主形态。同一个 queue 内部串行执行,不同 queue 之间并行执行。迁移只允许尚未被 scheduler claim 的 `queued`/`retry_wait` 任务,必须满足 `startedAt=null`、`currentAttempt=0` 且没有 active thread/turn;已进入 `running`/`judging` 或已有 claim 标记的任务返回 409,不得被 move/merge 回写成 queued。合并会移动可迁移任务归属并自动删除源 queue 记录,只保留合并后的目标 queue;若 source 或 target queue 存在 active/claimed 任务,合并整体返回 409。合并后的目标 queue 按任务原 `queueEnteredAt`/`createdAt` 时间顺序串行,成功迁移 queued/retry_wait 任务后由 D601 scheduler 轮询推进。 - 所有 `codex` 查询和管理命令必须走与 WebUI 相同的 backend-core 私有代理路径 `/api/microservices/code-queue/proxy/...`;CLI 不得为了提交、移动、中断、取消或队列管理直接调用 D601 内部 Service、数据库、pod curl 或 k3sctl scheduler 子服务。若该路径失败,应先修复 CLI/backend/provider tunnel 链路,而不是绕过控制面。 diff --git a/docs/reference/code-queue-supervision.md b/docs/reference/code-queue-supervision.md index 8d75a9e4..0ec103ea 100644 --- a/docs/reference/code-queue-supervision.md +++ b/docs/reference/code-queue-supervision.md @@ -294,7 +294,7 @@ D601 artifact registry 的 systemd unit inactive 不等于 D601 全局离线。 只有存在明确理由时才干预。 - 如果任务还在运行且 trace 或 scheduler heartbeat 新鲜,应引导而不是 interrupt。 -- 对运行中任务的引导应优先使用正式 CLI:`bun scripts/cli.ts codex steer --prompt-file `。该命令和 `codex task/tasks/read` 复用同一个 backend-core stable proxy helper;`--dry-run` 会显示 `method/path/stableProxyPath`、retry policy、prompt 摘要和 raw proxy 等价命令但不发送。非 dry-run 默认对 `stable-proxy-failed` 和 `backend-core-unreachable` 做一次有界重试,失败时先看 `.data.diagnostics.reason`、`.data.diagnostics.attempts` 和 `.data.diagnostics.operatorGuidance`:`backend-core-unreachable` 属于本机到 core 的观察路径,`code-queue-microservice-unregistered`/`proxy-unauthorized`/`proxy-404` 属于 stable proxy 配置或权限,`steer-endpoint-404`/`upstream-runtime-rejected` 属于 D601 runtime 或任务状态,`stable-proxy-failed` 多为 provider/k3s/tunnel 链路问题。`502 provider HTTP tunnel failed`、`The operation was aborted`、约 30 秒 provider tunnel wait abort 仍失败时,`.data.steer.deliveryUnconfirmed=true`;指挥官应先用 `codex tasks --view supervisor --limit 20`、`codex task ` 和 `microservice health code-queue` 交叉确认任务活性,再从主 server CLI 或显式 SSH transport 重试同一个 steer。raw proxy 等价命令走同一条 tunnel,`rawProxyEquivalentIsFallback=false`,只能做诊断对照,不应作为正式 fallback。若正式 CLI 自身不可用,临时通过受控 microservice proxy 调用只能作为现场恢复手段;这类绕行必须记录到指挥简报 issue #24 主体的常驻观察,并创建正式 issue 补齐 CLI 能力,避免长期依赖隐式 API。 +- 对运行中任务的引导应优先使用正式 CLI:`bun scripts/cli.ts codex steer --prompt-file `。该命令和 `codex task/tasks/read` 复用同一个 backend-core stable proxy helper;`--dry-run` 会显示 `method/path/stableProxyPath`、retry policy、prompt 摘要和 raw proxy 等价命令但不发送。非 dry-run 默认对 `stable-proxy-failed` 和 `backend-core-unreachable` 做一次有界重试,失败时先看 `.data.diagnostics.reason`、`.data.diagnostics.attempts` 和 `.data.diagnostics.operatorGuidance`:`backend-core-unreachable` 属于本机到 core 的观察路径,`code-queue-microservice-unregistered`/`proxy-unauthorized`/`proxy-404` 属于 stable proxy 配置或权限,`steer-endpoint-404`/`upstream-runtime-rejected` 属于 D601 runtime 或任务状态,`stable-proxy-failed` 多为 provider/k3s/tunnel 链路问题。`502 provider HTTP tunnel failed`、`The operation was aborted`、约 30 秒 provider tunnel wait abort 仍失败时,`.data.steer.deliveryUnconfirmed=true`;指挥官应先用 `codex tasks --view supervisor --limit 20`、`codex task ` 和 `microservice health code-queue` 交叉确认任务活性,再从主 server CLI 或显式 SSH transport 重试同一个 steer。raw proxy 等价命令走同一条 tunnel,`rawProxyEquivalentIsFallback=false`,只能做诊断对照,不应作为正式 fallback。若任务已终态,CLI 返回紧凑 `task-already-terminal` 响应并给出 `bun scripts/cli.ts codex task `、`bun scripts/cli.ts codex read ` 和 `bun scripts/cli.ts codex submit --prompt-file --reference-task-id `;指挥官应提交 follow-up task,而不是继续 steer 终态任务。若正式 CLI 自身不可用,临时通过受控 microservice proxy 调用只能作为现场恢复手段;这类绕行必须记录到指挥简报 issue #24 主体的常驻观察,并创建正式 issue 补齐 CLI 能力,避免长期依赖隐式 API。 - 如果任务进入终态但缺少必要验收证据,应使用聚焦 continuation prompt retry 同一任务。 - 如果任务被可复用基础设施缺陷阻塞,应把该缺陷分配给合适的空闲或低风险队列,让原业务任务等待,或在修复后 retry。 - 如果基础设施缺陷影响 Code Queue 控制面可用性,指挥官可以执行恢复队列所需的最小受控部署,然后验证原任务能继续。 diff --git a/scripts/code-queue-cli-steer-test.ts b/scripts/code-queue-cli-steer-test.ts index 2b41fa72..77b6e745 100644 --- a/scripts/code-queue-cli-steer-test.ts +++ b/scripts/code-queue-cli-steer-test.ts @@ -200,6 +200,45 @@ export function runCodeQueueCliSteerContract(): JsonRecord { assertCondition(nestedRecord(exhaustedDiagnostics, ["operatorGuidance"]).rawProxyEquivalentIsFallback === false, "raw proxy equivalent should be diagnostic, not fallback", exhaustedDiagnostics); assertCondition(String(nestedRecord(exhausted, ["commands"]).rawProxy || "").includes("microservice proxy code-queue /api/tasks/direct_task/steer"), "failure should still expose raw proxy diagnostic command", exhausted); + const terminalPrompt = `${"do not leak ".repeat(40)}tail-secret-marker`; + const terminalRejection = codexSteerTaskForTest("completed_task", [terminalPrompt], () => ({ + ok: false, + status: 409, + body: { + ok: false, + error: "task does not have an active steerable turn", + task: { + id: "completed_task", + queueId: "default", + status: "succeeded", + terminalStatus: "completed", + currentAttempt: 1, + updatedAt: "2026-05-22T00:00:00.000Z", + finishedAt: "2026-05-22T00:00:00.000Z", + prompt: `${"hidden task prompt ".repeat(60)}tail`, + output: [{ seq: 1, text: "noisy raw task output" }], + }, + }, + })) as JsonRecord; + const terminalSteer = nestedRecord(terminalRejection, ["steer"]); + assertCondition(terminalRejection.ok === false, "terminal steer rejection should fail", terminalRejection); + assertCondition(terminalSteer.reason === "task-already-terminal", "terminal steer rejection should use compact terminal reason", terminalSteer); + assertCondition(terminalSteer.status === "succeeded", "terminal steer rejection should expose task status", terminalSteer); + assertCondition(terminalSteer.terminalStatus === "completed", "terminal steer rejection should expose terminal status", terminalSteer); + assertCondition(terminalSteer.lastUpdate === "2026-05-22T00:00:00.000Z", "terminal steer rejection should expose last update", terminalSteer); + assertCondition(terminalSteer.updatedAt === "2026-05-22T00:00:00.000Z", "terminal steer rejection should expose last update time", terminalSteer); + assertCondition(terminalSteer.retryable === false, "terminal steer rejection should not be retryable", terminalSteer); + const terminalCommands = nestedRecord(terminalRejection, ["commands"]); + assertCondition(String(terminalCommands.show || "").includes("codex task completed_task"), "terminal rejection should suggest show command", terminalCommands); + assertCondition(String(terminalCommands.read || "").includes("codex read completed_task"), "terminal rejection should suggest read command", terminalCommands); + assertCondition(String(terminalCommands.followUpSubmit || "").includes("codex submit --prompt-file --reference-task-id completed_task"), "terminal rejection should suggest follow-up submit pattern", terminalCommands); + const terminalJson = JSON.stringify(terminalRejection); + assertCondition(!terminalJson.includes("tail-secret-marker"), "terminal rejection must not echo steer prompt", terminalRejection); + assertCondition(!terminalJson.includes("hidden task prompt"), "terminal rejection must not echo task prompt", terminalRejection); + assertCondition(!terminalJson.includes("noisy raw task output"), "terminal rejection must not echo task output", terminalRejection); + assertCondition(!("request" in terminalRejection), "terminal rejection should omit request preview", terminalRejection); + assertCondition(!("diagnostics" in terminalRejection), "terminal rejection should omit bulky diagnostics", terminalRejection); + return { ok: true, checks: [ @@ -216,6 +255,7 @@ export function runCodeQueueCliSteerContract(): JsonRecord { "successful steer confirms write without echoing prompt", "steer failure classification is JSON-consumable", "retryable tunnel aborts are retried with bounded diagnostics", + "terminal steer rejection is compact and actionable", ], }; } diff --git a/scripts/src/code-queue.ts b/scripts/src/code-queue.ts index 0abff87b..54bd609c 100644 --- a/scripts/src/code-queue.ts +++ b/scripts/src/code-queue.ts @@ -621,11 +621,57 @@ function classifySteerFailure(response: unknown, targetPath: string, stableProxy }; } -function unwrapSteerResponse(response: unknown, targetPath: string, stableProxyPath: string, rawProxyEquivalent: string): { ok: true; upstream: { ok: unknown; status: unknown }; body: Record } | { ok: false; diagnostics: ClassifiedCodexSteerError } { +function unwrapSteerResponse(response: unknown, targetPath: string, stableProxyPath: string, rawProxyEquivalent: string): { ok: true; upstream: { ok: unknown; status: unknown }; body: Record } | { ok: false; diagnostics: ClassifiedCodexSteerError; rawResponse: unknown } { const record = asRecord(response); const body = responseBody(record); if (record?.ok === true && body?.ok === true) return { ok: true, upstream: { ok: record.ok, status: record.status }, body }; - return { ok: false, diagnostics: classifySteerFailure(response, targetPath, stableProxyPath, rawProxyEquivalent) }; + return { ok: false, diagnostics: classifySteerFailure(response, targetPath, stableProxyPath, rawProxyEquivalent), rawResponse: response }; +} + +function terminalStatusFromTask(task: Record | null): string { + const direct = asString(task?.terminalStatus); + if (direct.length > 0) return direct; + const attempts = asArray(task?.attempts).map((item) => asRecord(item)).filter((item): item is Record => item !== null); + for (let index = attempts.length - 1; index >= 0; index -= 1) { + const status = asString(attempts[index]?.terminalStatus); + if (status.length > 0) return status; + } + return ""; +} + +function compactTerminalSteerRejection(taskId: string, response: unknown): Record | null { + const record = asRecord(response); + const body = responseBody(record); + const task = asRecord(body?.task); + const status = asString(task?.status); + if (!isTerminalTaskStatus(status)) return null; + const terminalStatus = terminalStatusFromTask(task); + const lastUpdate = task?.updatedAt ?? task?.finishedAt ?? null; + return { + ok: false, + steer: { + accepted: false, + reason: "task-already-terminal", + taskId, + status, + terminalStatus: terminalStatus || null, + lastUpdate, + updatedAt: task?.updatedAt ?? null, + finishedAt: task?.finishedAt ?? null, + retryable: false, + }, + message: `task ${taskId} is already terminal (${status}); codex steer only applies to an active running turn`, + commands: { + show: `bun scripts/cli.ts codex task ${taskId}`, + read: `bun scripts/cli.ts codex read ${taskId}`, + followUpSubmit: `bun scripts/cli.ts codex submit --prompt-file --reference-task-id ${taskId}`, + supervisor: `bun scripts/cli.ts codex tasks --view supervisor --limit ${defaultTasksLimit}`, + }, + upstream: { + status: responseStatus(record), + error: asString(body?.error) || null, + }, + }; } function steerSuccessAttempt(attempt: number, durationMs: number, upstream: { ok: unknown; status: unknown }): CodexSteerAttemptSummary { @@ -4756,6 +4802,8 @@ function codexSteerTask(taskId: string, args: string[], fetcher: CodexResponseFe } attempts.push(steerFailureAttempt(attempt, durationMs, response.diagnostics)); failedResponse = response; + const terminalRejection = compactTerminalSteerRejection(taskId, response.rawResponse); + if (terminalRejection !== null) return terminalRejection; if (!shouldRetrySteerFailure(response.diagnostics, attempt, options.retryAttempts)) break; sleepSync(options.retryDelayMs); } diff --git a/scripts/src/help.ts b/scripts/src/help.ts index f9954625..e73c0616 100644 --- a/scripts/src/help.ts +++ b/scripts/src/help.ts @@ -61,7 +61,7 @@ export function rootHelp(): unknown { { command: "codex read ", description: "Mark one reviewed terminal task read and return terminal metadata plus final response; prompt/tool logs stay behind drill-down commands." }, { command: "codex dev-ready", description: "Fetch execution-container readiness, including sanitized skill injection status from /api/dev-ready." }, { command: "codex judge --attempt N [--dry-run] [--include-prompt]", description: "Replay one stored Code Queue attempt through the same judge context builder and MiniMax judge call path used by the live queue worker." }, - { command: "codex steer [prompt|--prompt-file path|--prompt-stdin] [--dry-run] [--no-retry|--retry-attempts N]", description: "Push a corrective prompt into a running Code Queue task; retryable tunnel aborts get bounded retry diagnostics, and real success does not echo prompt text." }, + { command: "codex steer [prompt|--prompt-file path|--prompt-stdin] [--dry-run] [--no-retry|--retry-attempts N]", description: "Push a corrective prompt into a running Code Queue task; retryable tunnel aborts get bounded diagnostics, terminal-task rejection suggests codex task/read plus codex submit --reference-task-id , and real success does not echo prompt text." }, { command: "codex interrupt|cancel ", description: "Request interrupt for a running Code Queue task, or cancel a queued/retry_wait task, through the same private proxy." }, { command: "codex (queues [--full|--all] | queue create | queue merge --into | move --queue )", description: "List low-noise queue summaries by default, including effective activity counts that distinguish scheduler-local queues, DB running tasks, and heartbeat-fresh runners; full queue rows require --full/--all." }, { command: "job list [--limit N] [--include-command]", description: "List async jobs from .state/jobs with a bounded default page." }, @@ -300,7 +300,7 @@ function codexHelp(): unknown { redline: "data.supervisor.activeRunning.redline names the count field, routine target, burst redline, hard redline, and decisionReady flag.", limitSemantics: "filters.requestedLimit preserves the user input; filters.limit/effectiveLimit shows the capped query budget; section outputBudget/rowPage show returned-row caps.", }, - description: "Operate Code Queue through the stable backend-core private proxy path with bounded activity summaries for queue and supervisor views. Real submit/steer success is a low-noise write confirmation and does not echo prompt text.", + description: "Operate Code Queue through the stable backend-core private proxy path with bounded activity summaries for queue and supervisor views. Real submit/steer success is a low-noise write confirmation and does not echo prompt text; terminal steer rejection returns compact status plus codex task/read/submit follow-up commands.", }; }