From 49dfbc0b3dd8f5235ce255d4033a43fae7cb67ea Mon Sep 17 00:00:00 2001 From: Codex Date: Sat, 23 May 2026 09:08:05 +0000 Subject: [PATCH] fix: clarify codex submit queue summary disclosure --- docs/reference/cli.md | 2 +- docs/reference/code-queue-supervision.md | 4 +- ...code-queue-submit-summary-contract-test.ts | 81 ++++++++ scripts/src/code-queue.ts | 181 ++++++++++++++---- scripts/src/help.ts | 6 + 5 files changed, 232 insertions(+), 42 deletions(-) diff --git a/docs/reference/cli.md b/docs/reference/cli.md index 97f7fb95..c552d924 100644 --- a/docs/reference/cli.md +++ b/docs/reference/cli.md @@ -45,7 +45,7 @@ CLI 可以从 `master` 快速演进,但必须兼容 `deploy.json` 固定的 CI - `ci install|status|run|publish-backend-core|publish-user-service|run-dev-e2e|logs` 管理 D601 原生 k3s 上的 Tekton CI。`run` 手动创建每 commit 检查和 Code Queue 只读性能门禁;`publish-backend-core` 与 `publish-user-service` 从 pushed Git commit 构建并发布 `127.0.0.1:5000/unidesk/:` commit-pinned artifacts,输出 `artifactSummary`(含 `serviceId`、`sourceCommit`、`sourceRepo`、`dockerfile`、`imageRef`、`tag`、`digest`、`digestRef`),但不部署生产;`run-dev-e2e` 的 Git 控制 runner、短 launcher、host fetch 边界、临时 smoke namespace 和 no-CD 规则只在 `docs/reference/dev-ci-runner.md` 定义;Tekton CI 通用规则见 `docs/reference/ci.md`。 - `schedule list|get|runs|run|retry-run|delete|upsert-pgdata-backup` 管理 backend-core 定时任务和运行历史。`schedule list`、`schedule get`、`schedule runs --limit N` 和 `schedule runs --limit N` 是只读观察入口;`schedule run`、`schedule retry-run`、`schedule delete` 和 `schedule upsert-pgdata-backup` 会触发运行或写入配置,生产恢复时必须有明确授权。`schedule runs --limit N` 是全局历史视图,返回 `scope=global` 和 `scheduleId=null`;`schedule runs --limit N` 是指定 schedule 历史视图,返回 `scope=schedule` 和对应 `scheduleId`。CLI 必须拒绝 `schedule runs 50` 这类纯数字位置参数,并提示使用 `schedule runs --limit 50`,避免把空数组误判成“没有历史 run”。`schedule run --wait-ms N` 触发同一 schedule,并且即使 wait 超时也必须返回 `newRunId` 和 `observeCommand`;`schedule retry-run ` 只接受 failed run,从原 run 反查 `scheduleId` 后重触发同一 schedule,并输出 `originalRunId`、`scheduleId`、`newRunId` 和 `observeCommand`。当 backend-core 目标容器缺失或只观察到 verify-only 容器时,schedule/microservice 命令必须以非零退出并返回 `failureKind=target-stack-not-running`、`runnerDisposition=infra-blocked`、`readOnlyCommands` 和 `authorizationRequiredForRecovery`,不得把 Docker 的 `No such container` 当成成功的空历史。 - `codex deploy ` 是旧 Code Queue 兼容部署入口,已禁用以防止维护通道直连 D601 部署 Code Queue;当前 dev 自动化只做 `ci run-dev-e2e` smoke,不提供 Code Queue CD,详细规则见 `docs/reference/codex-deploy.md`。 -- `codex submit [prompt] [--prompt-file path|--prompt-stdin] [--queue queueId] [--provider-id id] [--cwd path] [--model model] [--reasoning-effort effort] [--execution-mode mode] [--max-attempts N] [--reference-task-id id] [--dry-run]` 通过 backend-core 私有代理向稳定 `code-queue` 用户服务路径提交任务;prompt 必须且只能来自位置参数、文件或 stdin 之一,`--dry-run` 只返回结构化请求且不实际入队。长 prompt、多行 prompt、含引号/反引号/Markdown 表格/JSON/反斜杠的 prompt 必须优先用 `--prompt-stdin` 或 `--prompt-file`,不要拼进 shell 单个参数;位置参数只适合短单行 smoke prompt。stdin 推荐用 quoted heredoc:`cat <<'PROMPT' | bun scripts/cli.ts codex submit --prompt-stdin --queue --dry-run`,文件路径推荐 `bun scripts/cli.ts codex submit --prompt-file /tmp/code-queue-prompt.md --queue --dry-run`,确认 dry-run 后移除 `--dry-run` 提交同一 payload。dry-run 会额外输出 `routingRecommendation`,包含推荐 route、runner、model、风险信号、prompt 自包含/issue 非唯一来源/prod-secret-DB 禁止/运行态或 release 禁止/证据要求/中等复杂度候选等 guard 状态;同时输出 `policyContract`,固定暴露 GPT-5.5、DeepSeek、MiniMax 的风险分层、并发上限和外部 provider 429 退避处置。该建议只用于指挥官 preflight,不会改写 payload,不改变 runtime admission,也不假设生产 MiniMax 或 DeepSeek 可用。`--dry-run` 必须返回完整 prompt、字符数和 `truncated=false` 用于人工验收;真实提交是写入操作,默认只返回 `accepted=true`、task id、队列、写入保护摘要和后续查看命令,必须标记 `promptOmitted=true` 且不得回显 prompt 或 promptPreview。真实提交会经过本机本地串行化保护和短节流,避免同一指挥端并发 submit 把低内存主机或 `code-queue-mgr` 控制面打抖;返回值会附带 `executionMode`、`runnerPermissions` 和低噪声 `submitConcurrencyGuard`,显式说明 requested/effective mode、服务级 runner sandbox/approvalPolicy、锁与等待信息。`--execution-mode` 是 Code Queue runtime placement,不是 Codex sandbox 权限;有效模式是 `default` 和 `windows-native`,`--execution-mode full-access` 等 sandbox-like 值会保留 requested 值并显示 effective `default`,同时提示当前不支持每任务 sandbox override。真实提交的 `queue` 摘要保持低噪声:`submittedTaskIds`、`queuedTaskIds`、`activeTaskIds` 和 `databaseActiveTaskIds` 是带 `items/count/returned/omitted/truncated/source` 的有界预览对象,`queuedTaskIds.items` 必须包含本次新入队的 queued/retry_wait 任务,`countContext` 与 `counts` 是权威计数;当预览被省略或截断时,`listPreviewPolicy` 必须写明 omitted counts 和 raw 查看命令。backend-core 默认把提交、队列 CRUD、已读状态、历史摘要和轻量 Trace 读取分流到主 server `code-queue-mgr`,由它写入主 PostgreSQL;D601 scheduler 只轮询并执行已入库任务。 +- `codex submit [prompt] [--prompt-file path|--prompt-stdin] [--queue queueId] [--provider-id id] [--cwd path] [--model model] [--reasoning-effort effort] [--execution-mode mode] [--max-attempts N] [--reference-task-id id] [--dry-run]` 通过 backend-core 私有代理向稳定 `code-queue` 用户服务路径提交任务;prompt 必须且只能来自位置参数、文件或 stdin 之一,`--dry-run` 只返回结构化请求且不实际入队。长 prompt、多行 prompt、含引号/反引号/Markdown 表格/JSON/反斜杠的 prompt 必须优先用 `--prompt-stdin` 或 `--prompt-file`,不要拼进 shell 单个参数;位置参数只适合短单行 smoke prompt。stdin 推荐用 quoted heredoc:`cat <<'PROMPT' | bun scripts/cli.ts codex submit --prompt-stdin --queue --dry-run`,文件路径推荐 `bun scripts/cli.ts codex submit --prompt-file /tmp/code-queue-prompt.md --queue --dry-run`,确认 dry-run 后移除 `--dry-run` 提交同一 payload。dry-run 会额外输出 `routingRecommendation`,包含推荐 route、runner、model、风险信号、prompt 自包含/issue 非唯一来源/prod-secret-DB 禁止/运行态或 release 禁止/证据要求/中等复杂度候选等 guard 状态;同时输出 `policyContract`,固定暴露 GPT-5.5、DeepSeek、MiniMax 的风险分层、并发上限和外部 provider 429 退避处置。该建议只用于指挥官 preflight,不会改写 payload,不改变 runtime admission,也不假设生产 MiniMax 或 DeepSeek 可用。`--dry-run` 必须返回完整 prompt、字符数和 `truncated=false` 用于人工验收;真实提交是写入操作,默认只返回 `accepted=true`、task id、队列、写入保护摘要和后续查看命令,必须标记 `promptOmitted=true` 且不得回显 prompt 或 promptPreview。真实提交会经过本机本地串行化保护和短节流,避免同一指挥端并发 submit 把低内存主机或 `code-queue-mgr` 控制面打抖;返回值会附带 `executionMode`、`runnerPermissions` 和低噪声 `submitConcurrencyGuard`,显式说明 requested/effective mode、服务级 runner sandbox/approvalPolicy、锁与等待信息。`--execution-mode` 是 Code Queue runtime placement,不是 Codex sandbox 权限;有效模式是 `default` 和 `windows-native`,`--execution-mode full-access` 等 sandbox-like 值会保留 requested 值并显示 effective `default`,同时提示当前不支持每任务 sandbox override。真实提交的 `queue` 摘要保持低噪声:`submittedTaskIds`、`queuedTaskIds`、`activeTaskIds` 和 `databaseActiveTaskIds` 是有界预览对象,`countContext` 与 `counts` 是权威计数;`submitted.taskStates[]` 直接给出本次 task id、queue id、status 和 `state=queued|running|terminal|unknown`,其来源固定为 `response.tasks[].status`。当本次新任务仍是 queued/retry_wait,`queuedTaskIds.items` 必须包含该 id;当 counts 非零但 active/queued id 列表因为 split-brain-live、上游省略或默认有界披露而不可枚举时,预览必须设置 `idsUnavailable=true`、`itemsOmitted=true` 和 `itemsMeaning=not-enumerated-in-default-submit-output`,不得打印容易误读的 `items=[]`。`queue.activity.effectiveActiveTaskCount` 和 `queue.commanderConcurrency.activeRunnerCount` 是并发判断字段;`splitBrainLive=true` 时继续把 fresh heartbeat/database active 计入 active。需要原始 drill-down 时使用 `queue.listPreviewPolicy.rawCommand`,默认是 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview?limit=30 --raw --full`。backend-core 默认把提交、队列 CRUD、已读状态、历史摘要和轻量 Trace 读取分流到主 server `code-queue-mgr`,由它写入主 PostgreSQL;D601 scheduler 只轮询并执行已入库任务。 - `codex steer [prompt|--prompt-file path|--prompt-stdin] [--dry-run] [--no-retry|--retry-attempts N] [--full|--raw]` 向运行中的 Code Queue 任务发送纠偏 prompt。真实成功只返回低噪声写入确认,不回显 prompt 或完整任务状态;失败默认只返回 `accepted=false`、原因、scope、retryable、attempt 摘要、operator guidance 和 task/read/submit/health drill-down 命令。`upstreamBodyPreview`、request 元数据和 raw upstream failure 必须显式加 `--full` 或 `--raw` 才输出。任务已终态时返回紧凑 `task-already-terminal`、状态、终态状态、更新时间、`retryable=false` 和 `codex task` / `codex read` / `codex submit --reference-task-id ` 后续命令。 - `codex pr-preflight [--remote] [--push-dry-run --push-dry-run-ref refs/heads/probe/] [--pr-create-dry-run --pr-create-dry-run-head ] [--issue N] [--full|--raw]` 通过稳定 `code-queue` proxy 请求 D601 scheduler `/api/runtime-preflight`,用于 PR 型派单 admission。默认输出是紧凑 commander 视图,显式分出 `schedulerPreflight` 与 `activeRunnerPrCapability`,并附带 `commands` 和 `disclosure`,方便先看 scheduler auth 缺口、再看当前 runner/dev container 的 `gh auth status` 与 `gh pr create --dry-run` 能力;`--full` 或 `--raw` 才展开完整 `preflight`、工具、agent port、Git worktree、GitHub egress、repo/issue/PR 只读探测和观测原文。只报告 `GH_TOKEN`/`GITHUB_TOKEN` 是否存在和来源 key,不打印值。当 auth-broker 配置存在时,`tokenCoverage.source="auth-broker"`、`credentialSource="broker-issued-token"` 且 runner env token 不是成功前提;当仅 env token 存在时,`credentialSource="env-token"` 且 `authBroker.nextAction="use-env-token-until-auth-broker-live"`;两者都缺失时顶层 `ok=false`、`runnerDisposition=infra-blocked`、`degradedReason=auth-broker-needed`,`tokenCoverage.missing` 同时列出 `GH_TOKEN` 与 `GITHUB_TOKEN`,并输出 `authBroker.source="broker/auth-broker-needed"`、`capability.source="missing-token"`。该 `auth-missing` 的 scope 是 `scheduler-runner-env`,不能简化成“当前 active runner/dev container 不能创建 PR”;默认视图必须带 `scopeBoundary` 和 `activeRunnerPrCapability`。GitHub DNS/API 连接失败应归类为 `failureKind=github-transient`、`degradedReason=github-dns-api-transient`,并带 `retryable=true`、`commanderAction=retry-backoff-or-keep-running-if-heartbeat-fresh` 和有界 `githubTransient.failedProbes`;调用方应重试/退避,且在任务 heartbeat/trace 新鲜时继续监督,不把它当成 auth 缺失或 PR 语义失败。`prCapability` 是 runner-facing 合同摘要,必须包含目标分支、token/auth 来源、`systemGhBinaryRequiredForWrites=false`、UniDesk REST `bun scripts/cli.ts gh` 可用性、push dry-run/PR create dry-run 的 `writesRemote=false`、expected PR handoff、真实 PR 创建需要 commander 授权和 `gh pr merge` 的 `unsupported-command` 边界;系统 `gh` binary 缺失只进入 `tools.systemGhBinary`,不得误判为 UniDesk REST `gh` CLI 不可用。`--remote` 在 runner-like 环境里不再依赖本地 `unidesk-backend-core`、`unidesk-database`、`baidu-netdisk-backend` 容器存在;这些缺失只作为本地观测证据。若远程控制面可达,则继续走远程控制面结果;若远程控制面不可达,则结构化返回 `failureKind=control-plane-missing` / `degradedReason=remote-control-plane-unreachable`,而不是把本地 `backend-core-container-missing` 当作最终阻塞。`--pr-create-dry-run` 不 POST GitHub,只证明 runner 内 PR body 生成、`scripts/cli.ts gh pr create --dry-run` 和 branch 参数形态可用;服务端创建权限仍以 token/auth broker、repo/issue/PR read、push dry-run 和最终授权后的真实 PR 创建结果为准。 - `codex task ` 通过 Code Queue 私有代理按任务 ID 查询结构化审阅摘要;默认只返回任务身份、执行 Provider、工作目录、attempt 计数、原始 prompt、最终 response、最后错误和渐进披露命令,适合指挥官审阅完成未读任务且避免上下文爆炸。`--detail` 仍是有界详细摘要:默认只返回少量 attempt/tool 行、短 prompt/response/stderr/feedback 预览和 omitted/truncated 元数据;需要完整 prompt/response 文本或更多 tool/attempt 细节时再显式加 `--full`、`--tool-limit N`、`--trace` 或 `codex output`。该摘要读取默认由主 server `code-queue-mgr` 从 PostgreSQL 返回,不依赖 D601 `code-queue-read` Service 可用。 diff --git a/docs/reference/code-queue-supervision.md b/docs/reference/code-queue-supervision.md index 97a6d7dc..ec053062 100644 --- a/docs/reference/code-queue-supervision.md +++ b/docs/reference/code-queue-supervision.md @@ -36,7 +36,7 @@ HWLAB 业务目标、验收和实现优先级归 `pikasTech/HWLAB#7`;UniDesk 审阅 HWLAB runner 输出时,不能把 `SOURCE`、`LOCAL`、`DRY-RUN`、fixture 或只读报告误当成 `DEV-LIVE`。除非输出真的证明了 `res_boxsimu_1:DO1 -> hwlab-patch-panel -> res_boxsimu_2:DI1` 的真链路,并且带有 operation / audit / evidence 关联,否则只能归类为 support、diagnostics 或 contract。 -`split-brain live` 且 heartbeat/trace 新鲜时,指挥官必须继续监督,不把它当作服务中断。此类状态的优先动作是继续轮询、继续审阅、继续派单,而不是默认 interrupt 或 cancel。 +`split-brain live` 且 heartbeat/trace 新鲜时,指挥官必须继续监督,不把它当作服务中断。此类状态的优先动作是继续轮询、继续审阅、继续派单,而不是默认 interrupt 或 cancel。`codex submit` 的默认写入确认也遵守同一口径:如果 queue counts 显示 running/queued 非零,但 default summary 不能枚举 active/queued id 列表,CLI 必须返回 `idsUnavailable=true` / `itemsOmitted=true` 和 `stateDisclosure.idsUnavailableMeaning`,而不是输出看起来像“没有 active/queued 任务”的 `items=[]`。需要 raw drill-down 时使用返回的 `queue.listPreviewPolicy.rawCommand`,即 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview?limit=30 --raw --full`。 live-read browser audit 只用于观察已部署 UI,不授权写入。未获得显式 live mutation 授权时,审计浏览器只能放行 `GET`、`HEAD` 和 `OPTIONS`;`POST`、`PUT`、`PATCH`、`DELETE` 以及其他可能改变状态的方法必须被拦截并 abort,报告时统一标记为 `audit guard blocked page mutation attempt`,同时记录 method、path、触发的页面动作和已拦截事实。这个证据只能证明页面渲染、只读请求和某个交互会尝试发起写请求;它不能证明 backend outage、写入失败、写入成功、持久化状态变化或 mutating workflow 已验收。需要真实点击、提交、启动、停止、保存、删除、训练或其他 live-mutating acceptance 时,必须先取得针对目标服务、动作和环境的明确授权,并按授权后的验证规则单独记录结果。 @@ -257,7 +257,7 @@ bun scripts/cli.ts codex pr-preflight --remote --issue 常用入口: -- `bun scripts/cli.ts codex tasks --view supervisor --limit N`:查看默认低噪声监督视图,包括 `activeRunning`、running、完成未读、少量最近完成、queued/runnable、activity、commanderConcurrency、execution diagnostics、任务分类和下一步 drill-down 命令。默认行只保留 task id、队列、短 prompt/body 预览和原始字符数;`--limit` 是扫描/分页预算,不是返回几十条肥行的开关,CLI effective limit 安全上限为 100,输出必须用 `filters.requestedLimit`、`filters.effectiveLimit`、`filters.limitCapped`、`source.requestedLimit` 和 `source.effectiveLimit` 区分用户请求、CLI cap 和 overview 源拉取预算;例如 `--limit 260` 应明确显示 requested=260、effective=100、source=200,`running.returned` 只是低噪声返回行数。`show/detail/trace/output/full/read` 放在 section template 中,避免每条任务重复刷屏,需要更多内容再按 taskId 展开。 +- `bun scripts/cli.ts codex tasks --view supervisor --limit N`:查看默认低噪声监督视图,包括 `activeRunning`、running、完成未读、少量最近完成、queued/runnable、activity、commanderConcurrency、execution diagnostics、任务分类和下一步 drill-down 命令。默认行只保留 task id、队列、短 prompt/body 预览和原始字符数;`--limit` 是扫描/分页预算,不是返回几十条肥行的开关,CLI effective limit 安全上限为 100,输出必须用 `filters.requestedLimit`、`filters.effectiveLimit`、`filters.limitCapped`、`source.requestedLimit` 和 `source.effectiveLimit` 区分用户请求、CLI cap 和 overview 源拉取预算;例如 `--limit 260` 应明确显示 requested=260、effective=100、source=200,`running.returned` 只是低噪声返回行数。`show/detail/trace/output/full/read` 放在 section template 中,避免每条任务重复刷屏,需要更多内容再按 taskId 展开。刚执行 `codex submit` 后也可以先读 submit 返回的 `submitted.taskStates[]`、`queue.countContext`、`queue.activity.effectiveActiveTaskCount` 和 `queue.stateDisclosure`;若某个 id preview 有 `idsUnavailable=true`,不要把它当成空队列,按 `queue.listPreviewPolicy.rawCommand` 或本 supervisor 命令继续查。 - `bun scripts/cli.ts codex queues`:查看低噪声队列计数、activity、commanderConcurrency、active task id、完成未读队列、runnable 队列和控制面诊断;需要完整队列行视图时加 `--full`,但 `--full` 仍默认分页,继续用 `--limit N`、`--page N` 或 `--offset N` 渐进展开。summary 和 full 都使用稳定 JSON path `.data.queues.items[]` 读取队列行,并从 `.data.queues.commanderConcurrency`、`.data.queues.activity`、`.data.queues.counts` 与 `.data.queues.executionDiagnostics` 读取全局活跃计数和执行诊断;完整 upstream 只通过输出中的 raw command 显式获取。 - `bun scripts/cli.ts codex unread --limit N`:查看完成未读审阅积压的默认 triage,按 repo、issue、status 和 queue 汇总,并给出有界最新任务和 drill-down/read 命令;默认不输出 raw prompt、final response、trace 或 output。 - `bun scripts/cli.ts codex unread mark-read --repo owner/name --issue N --limit N --confirm`:批量已读入口,必须显式 `mark-read` 和 `--confirm`,否则结构化失败且不 POST `/read`。 diff --git a/scripts/code-queue-submit-summary-contract-test.ts b/scripts/code-queue-submit-summary-contract-test.ts index 14ce1aae..6385109a 100644 --- a/scripts/code-queue-submit-summary-contract-test.ts +++ b/scripts/code-queue-submit-summary-contract-test.ts @@ -16,6 +16,14 @@ function asArray(value: unknown): unknown[] { return value as unknown[]; } +function assertNoItemsArrayWhenUnavailable(value: JsonRecord, label: string): void { + assertCondition(value.idsUnavailable === true, `${label} should mark non-enumerated ids unavailable`, value); + assertCondition(!Object.prototype.hasOwnProperty.call(value, "items"), `${label} must not emit items=[] when count is nonzero but ids are unavailable`, value); + assertCondition(value.itemsOmitted === true, `${label} should explicitly mark items omitted`, value); + assertCondition(String(value.itemsMeaning || "") === "not-enumerated-in-default-submit-output", `${label} should explain empty-list semantics`, value); + assertCondition(String(value.rawCommand || "").includes("microservice proxy code-queue /api/tasks/overview"), `${label} should provide raw drill-down`, value); +} + function task(id: string, status: string, queueId = "commander-efficiency"): JsonRecord { return { id, @@ -62,6 +70,8 @@ export function runCodeQueueSubmitSummaryContract(): JsonRecord { const submitted = asRecord(data.submitted); const submittedTasks = asArray(submitted.tasks); const submittedTask = asRecord(submittedTasks[0]); + const taskStates = asArray(submitted.taskStates); + const submittedState = asRecord(taskStates[0]); const queue = asRecord(data.queue); const queuedTaskIds = asRecord(queue.queuedTaskIds); const activeTaskIds = asRecord(queue.activeTaskIds); @@ -70,9 +80,13 @@ export function runCodeQueueSubmitSummaryContract(): JsonRecord { const countContext = asRecord(queue.countContext); const listPreviewPolicy = asRecord(queue.listPreviewPolicy); const omittedCounts = asRecord(listPreviewPolicy.omittedCounts); + const stateDisclosure = asRecord(queue.stateDisclosure); + const activity = asRecord(queue.activity); const responseJson = JSON.stringify(response); assertCondition(submittedTask.id === submittedId && submittedTask.status === "queued", "submit response should keep the newly queued task", submittedTask); + assertCondition(submittedState.id === submittedId && submittedState.status === "queued" && submittedState.state === "queued", "submitted task state should be explicit and authoritative", submittedState); + assertCondition(String(submitted.stateSource || "").includes("response.tasks"), "submitted state source should point at response.tasks", submitted); assertCondition(asArray(submittedTaskIds.items).includes(submittedId), "submittedTaskIds should expose the just-submitted id", submittedTaskIds); assertCondition(asArray(queuedTaskIds.items).includes(submittedId), "queuedTaskIds preview should force-include the just-submitted queued task", queuedTaskIds); assertCondition(queuedTaskIds.count === 5 && queuedTaskIds.returned === 1 && queuedTaskIds.omitted === 4, "queuedTaskIds should preserve aggregate queued count without dumping all ids", queuedTaskIds); @@ -84,18 +98,85 @@ export function runCodeQueueSubmitSummaryContract(): JsonRecord { assertCondition(String(activeTaskIds.source || "").includes("databaseActiveTaskIds"), "activeTaskIds should fall back to database active ids when upstream activeTaskIds is empty", activeTaskIds); assertCondition(databaseActiveTaskIds.count === 18 && databaseActiveTaskIds.returned === 15, "databaseActiveTaskIds preview should preserve count context", databaseActiveTaskIds); assertCondition(countContext.running === 18 && countContext.active === 18 && countContext.databaseActive === 18, "countContext should expose accurate active counts", countContext); + assertCondition(activity.effectiveActiveTaskCount === 18, "submit queue activity should expose commander effective active count", activity); assertCondition(listPreviewPolicy.bounded === true && listPreviewPolicy.countsAreAuthoritative === true, "list preview policy should document bounded low-noise output", listPreviewPolicy); assertCondition(listPreviewPolicy.truncated === true && omittedCounts.activeTaskIds === 3 && omittedCounts.queuedTaskIds === 4, "list preview policy should disclose omitted counts", listPreviewPolicy); + assertCondition(String(listPreviewPolicy.emptyItemsSemantics || "").includes("idsUnavailable=true"), "list preview policy should document nonzero-count unavailable ids", listPreviewPolicy); + assertCondition(String(stateDisclosure.submittedStatusSource || "").includes("response.tasks"), "stateDisclosure should name submitted status source", stateDisclosure); assertCondition(String(listPreviewPolicy.note || "").includes("Low-noise mutation output omits"), "list preview policy should include a clear truncation note", listPreviewPolicy); assertCondition(submitted.promptOmitted === true && !responseJson.includes("Focused submit summary contract"), "submit confirmation should not leak prompt text", response); assertCondition(responseJson.length < 12_000, "submit confirmation should remain low-noise", { chars: responseJson.length }); + const activeIdsOmitted = compactSubmitSuccessResponseForTest({ + tasks: [task("codex_submitted_queued_while_running", "queued", "live-fast-lane")], + queue: { + counts: { running: 9, queued: 1 }, + activeTaskIds: { items: [], count: 9, returned: 0, truncated: true }, + queuedTaskIds: { items: [], count: 1, returned: 0, truncated: true }, + databaseActiveTaskCount: 9, + executionDiagnostics: { + state: "split-brain", + splitBrain: true, + splitBrainLive: true, + effectiveLiveness: "live", + recommendedAction: "continue-supervision", + databaseActiveTaskCount: 9, + schedulerActiveRunSlotCount: 0, + schedulerActiveTaskIds: [], + activeHeartbeatCount: 9, + heartbeatFreshTaskIds: [], + }, + }, + }, { ok: true, status: 200 }, { mode: "local-atomic-directory-submit-serialization", acquiredAfterMs: 1, heldMs: 2, throttleMs: 2000 }); + const omittedQueue = asRecord(asRecord(activeIdsOmitted).queue); + const omittedActive = asRecord(omittedQueue.activeTaskIds); + const forcedQueued = asRecord(omittedQueue.queuedTaskIds); + const omittedPolicy = asRecord(omittedQueue.listPreviewPolicy); + const unavailable = asRecord(omittedPolicy.unavailableIdLists); + assertCondition(omittedActive.count === 9 && omittedActive.returned === 0, "running-count nonzero should be preserved even with omitted active ids", omittedActive); + assertNoItemsArrayWhenUnavailable(omittedActive, "activeTaskIds"); + assertCondition(asArray(forcedQueued.items).includes("codex_submitted_queued_while_running"), "new queued submitted task should be force-included even if upstream queued ids were omitted", forcedQueued); + assertCondition(unavailable.activeTaskIds === true && unavailable.queuedTaskIds === false, "listPreviewPolicy should summarize unavailable active id list", unavailable); + + const liveSplitBrain = asRecord(omittedQueue.executionDiagnostics); + const liveStateDisclosure = asRecord(omittedQueue.stateDisclosure); + const liveActivity = asRecord(omittedQueue.activity); + assertCondition(liveSplitBrain.splitBrainLive === true && liveSplitBrain.effectiveLiveness === "live", "split-brain-live heartbeat context should stay explicit", liveSplitBrain); + assertCondition(liveActivity.splitBrainDisposition === "live-count-as-active", "split-brain-live should be counted as active in activity", liveActivity); + assertCondition(String(liveStateDisclosure.splitBrainDisposition || "").includes("continue supervision"), "stateDisclosure should explain split-brain-live disposition", liveStateDisclosure); + assertCondition(String(liveStateDisclosure.idsUnavailableMeaning || "").includes("not that there are no tasks"), "stateDisclosure should prevent empty-list misread", liveStateDisclosure); + + const queuedIdsOmitted = compactSubmitSuccessResponseForTest({ + tasks: [task("codex_submitted_already_running", "running", "live-fast-lane")], + queue: { + counts: { running: 1, queued: 3 }, + activeTaskIds: { items: [], count: 1, returned: 0, truncated: true }, + queuedTaskIds: { items: [], count: 3, returned: 0, truncated: true }, + executionDiagnostics: { + state: "healthy", + databaseActiveTaskCount: 1, + databaseActiveTaskIds: ["codex_submitted_already_running"], + activeHeartbeatCount: 1, + heartbeatFreshTaskIds: ["codex_submitted_already_running"], + }, + }, + }, { ok: true, status: 200 }, { mode: "local-atomic-directory-submit-serialization", acquiredAfterMs: 1, heldMs: 2, throttleMs: 2000 }); + const queuedOmittedQueue = asRecord(asRecord(queuedIdsOmitted).queue); + const queuedOmitted = asRecord(queuedOmittedQueue.queuedTaskIds); + const queuedOmittedPolicy = asRecord(queuedOmittedQueue.listPreviewPolicy); + const queuedUnavailable = asRecord(queuedOmittedPolicy.unavailableIdLists); + assertCondition(queuedOmitted.count === 3 && queuedOmitted.returned === 0, "queued-count nonzero should be preserved even with omitted queued ids", queuedOmitted); + assertNoItemsArrayWhenUnavailable(queuedOmitted, "queuedTaskIds"); + assertCondition(queuedUnavailable.queuedTaskIds === true, "listPreviewPolicy should summarize unavailable queued id list", queuedUnavailable); + return { ok: true, checks: [ "newly queued submitted task is included in queuedTaskIds preview", "running count context falls back to database active ids", + "nonzero count with omitted id lists uses idsUnavailable instead of items=[]", + "split-brain-live submit summary says continue supervision and count as active", "bounded id previews disclose omitted counts", "submit confirmation omits prompt text and remains low-noise", ], diff --git a/scripts/src/code-queue.ts b/scripts/src/code-queue.ts index 74de9435..c0281910 100644 --- a/scripts/src/code-queue.ts +++ b/scripts/src/code-queue.ts @@ -34,6 +34,7 @@ const diagnosticsIdPreviewLimit = 3; const diagnosticsReasonPreviewLimit = 2; const mutationQueueIdPreviewLimit = 15; const steerPromptPreviewChars = 320; +const rawCodeQueueOverviewCommand = "bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview?limit=30 --raw --full"; const sandboxLikeExecutionModes = new Set(["full-access", "danger-full-access", "workspace-write", "read-only"]); const detailAttemptReturnedLimit = 3; const detailInitialPromptPreviewChars = 1200; @@ -4208,15 +4209,32 @@ function compactIdPreview(knownIds: string[], totalCount: number, limit: number, const count = Math.max(0, totalCount, all.length); const items = all.slice(0, limit); const omitted = Math.max(0, count - items.length); - return { - items, + const idsUnavailable = count > 0 && items.length === 0; + const result: Record = { count, returned: items.length, omitted, truncated: omitted > 0 || all.length > items.length, source, + countsAreAuthoritative: true, + idsUnavailable, + itemsMeaning: idsUnavailable + ? "not-enumerated-in-default-submit-output" + : count === 0 + ? "authoritative-empty" + : omitted > 0 || all.length > items.length + ? "bounded-preview" + : "complete-known-list", + rawCommand: rawCodeQueueOverviewCommand, ...(note === null ? {} : { note }), }; + if (idsUnavailable) { + result.itemsOmitted = true; + result.note = note ?? "Count is nonzero, but the default submit response did not enumerate ids for this list; use rawCommand for drill-down."; + } else { + result.items = items; + } + return result; } function idPreviewInputItems(value: unknown): string[] { @@ -4249,6 +4267,35 @@ function taskIdsForStatuses(tasks: Record[], statuses: Set): Record { + const id = asString(task.id); + const status = asString(task.status) || "unknown"; + const queueId = asString(task.queueId) || null; + const state = status === "queued" || status === "retry_wait" + ? "queued" + : status === "running" || status === "judging" + ? "running" + : isTerminalTaskStatus(status) + ? "terminal" + : "unknown"; + return { + id: id || null, + queueId, + status, + state, + source: "response.tasks[].status", + }; +} + +function submittedTaskStatusCounts(tasks: Record[]): Record { + const counts: Record = {}; + for (const task of tasks) { + const status = asString(task.status) || "unknown"; + counts[status] = (counts[status] ?? 0) + 1; + } + return counts; +} + function previewSource(parts: string[], fallback: string): string { const unique = orderedUniqueStringList(parts.filter((part) => part.length > 0)); return unique.length > 0 ? unique.join("+") : fallback; @@ -4269,17 +4316,25 @@ function compactSubmitQueueConfirmation(value: unknown, options: CompactSubmitQu const queuedKnownIds = orderedUniqueStringList([...submittedQueuedTaskIds, ...upstreamQueuedTaskIds]); const queuedStatusCount = countForStatus(counts, "queued") + countForStatus(counts, "retry_wait"); const queuedCount = Math.max(queuedKnownIds.length, idPreviewInputCount(record.queuedTaskIds), queuedStatusCount); - const queuedPreview = compactIdPreview( - queuedKnownIds, - queuedCount, - idPreviewLimit, - previewSource([ - submittedQueuedTaskIds.length > 0 ? "submittedTaskIds" : "", - upstreamQueuedTaskIds.length > 0 ? "upstreamQueuedTaskIds" : "", - queuedKnownIds.length === 0 && queuedCount > 0 ? "aggregateCountsOnly" : "", - ], "none"), - queuedCount > queuedKnownIds.length ? "Upstream did not enumerate every queued id in this low-noise mutation response; count remains authoritative." : null, - ); + const queuedPreview: Record = { + ...compactIdPreview( + queuedKnownIds, + queuedCount, + idPreviewLimit, + previewSource([ + submittedQueuedTaskIds.length > 0 ? "submittedTaskIds" : "", + upstreamQueuedTaskIds.length > 0 ? "upstreamQueuedTaskIds" : "", + queuedKnownIds.length === 0 && queuedCount > 0 ? "aggregateCountsOnly" : "", + ], "none"), + queuedCount > queuedKnownIds.length ? "Upstream did not enumerate every queued id in this low-noise mutation response; count remains authoritative." : null, + ), + sourceCounts: { + queued: countForStatus(counts, "queued"), + retryWait: countForStatus(counts, "retry_wait"), + upstreamQueuedTaskIds: idPreviewInputCount(record.queuedTaskIds), + submittedQueuedTaskIds: submittedQueuedTaskIds.length, + }, + }; const upstreamActiveTaskIds = idPreviewInputItems(record.activeTaskIds); const databaseActiveTaskIds = idPreviewInputItems(record.databaseActiveTaskIds); @@ -4305,30 +4360,49 @@ function compactSubmitQueueConfirmation(value: unknown, options: CompactSubmitQu databaseActiveTaskCount, statusActiveCount, ]); - const activePreview = compactIdPreview( - activeKnownIds, - activeCount, - idPreviewLimit, - previewSource([ - submittedActiveTaskIds.length > 0 ? "submittedTaskIds" : "", - upstreamActiveTaskIds.length > 0 ? "upstreamActiveTaskIds" : "", - databaseActiveTaskIds.length > 0 ? "databaseActiveTaskIds" : "", - diagnosticsDatabaseActiveTaskIds.length > 0 ? "executionDiagnostics.databaseActiveTaskIds" : "", - diagnosticsHeartbeatTaskIds.length > 0 ? "executionDiagnostics.heartbeatFreshTaskIds" : "", - activeKnownIds.length === 0 && activeCount > 0 ? "aggregateCountsOnly" : "", - ], "none"), - activeCount > activeKnownIds.length ? "Upstream only exposed aggregate active counts for part of the running set; count remains authoritative." : null, - ); - const databaseActivePreview = compactIdPreview( - orderedUniqueStringList([...databaseActiveTaskIds, ...diagnosticsDatabaseActiveTaskIds]), - maxFiniteNumber([databaseActiveTaskCount, idPreviewInputCount(record.databaseActiveTaskIds), idPreviewInputCount(diagnosticsRecord.databaseActiveTaskIds)]), - idPreviewLimit, - previewSource([ - databaseActiveTaskIds.length > 0 ? "databaseActiveTaskIds" : "", - diagnosticsDatabaseActiveTaskIds.length > 0 ? "executionDiagnostics.databaseActiveTaskIds" : "", - databaseActiveTaskCount > 0 && databaseActiveTaskIds.length === 0 && diagnosticsDatabaseActiveTaskIds.length === 0 ? "aggregateCountsOnly" : "", - ], "none"), - ); + const activePreview: Record = { + ...compactIdPreview( + activeKnownIds, + activeCount, + idPreviewLimit, + previewSource([ + submittedActiveTaskIds.length > 0 ? "submittedTaskIds" : "", + upstreamActiveTaskIds.length > 0 ? "upstreamActiveTaskIds" : "", + databaseActiveTaskIds.length > 0 ? "databaseActiveTaskIds" : "", + diagnosticsDatabaseActiveTaskIds.length > 0 ? "executionDiagnostics.databaseActiveTaskIds" : "", + diagnosticsHeartbeatTaskIds.length > 0 ? "executionDiagnostics.heartbeatFreshTaskIds" : "", + activeKnownIds.length === 0 && activeCount > 0 ? "aggregateCountsOnly" : "", + ], "none"), + activeCount > activeKnownIds.length ? "Upstream only exposed aggregate active counts for part of the running set; count remains authoritative." : null, + ), + sourceCounts: { + running: countForStatus(counts, "running"), + judging: countForStatus(counts, "judging"), + upstreamActiveTaskIds: idPreviewInputCount(record.activeTaskIds), + databaseActiveTaskCount, + diagnosticsDatabaseActiveTaskIds: idPreviewInputCount(diagnosticsRecord.databaseActiveTaskIds), + diagnosticsHeartbeatFreshTaskIds: idPreviewInputCount(diagnosticsRecord.heartbeatFreshTaskIds), + submittedActiveTaskIds: submittedActiveTaskIds.length, + }, + }; + const databaseActivePreview: Record = { + ...compactIdPreview( + orderedUniqueStringList([...databaseActiveTaskIds, ...diagnosticsDatabaseActiveTaskIds]), + maxFiniteNumber([databaseActiveTaskCount, idPreviewInputCount(record.databaseActiveTaskIds), idPreviewInputCount(diagnosticsRecord.databaseActiveTaskIds)]), + idPreviewLimit, + previewSource([ + databaseActiveTaskIds.length > 0 ? "databaseActiveTaskIds" : "", + diagnosticsDatabaseActiveTaskIds.length > 0 ? "executionDiagnostics.databaseActiveTaskIds" : "", + databaseActiveTaskCount > 0 && databaseActiveTaskIds.length === 0 && diagnosticsDatabaseActiveTaskIds.length === 0 ? "aggregateCountsOnly" : "", + ], "none"), + ), + sourceCounts: { + queueDatabaseActiveTaskCount: asNumber(record.databaseActiveTaskCount, Number.NaN), + diagnosticsDatabaseActiveTaskCount: asNumber(diagnosticsRecord.databaseActiveTaskCount, Number.NaN), + queueDatabaseActiveTaskIds: idPreviewInputCount(record.databaseActiveTaskIds), + diagnosticsDatabaseActiveTaskIds: idPreviewInputCount(diagnosticsRecord.databaseActiveTaskIds), + }, + }; const submittedPreview = compactIdPreview(submittedTaskIds, submittedTaskIds.length, idPreviewLimit, "response.tasks"); const omittedCounts = { activeTaskIds: asNumber(activePreview.omitted, 0), @@ -4341,6 +4415,15 @@ function compactSubmitQueueConfirmation(value: unknown, options: CompactSubmitQu || databaseActivePreview.truncated === true || queuedPreview.truncated === true || submittedPreview.truncated === true; + const diagnostics = compactQueueExecutionDiagnostics(record.executionDiagnostics); + const activity = compactCodeQueueActivity(record, diagnostics); + const commanderConcurrency = asRecord(activity.commanderConcurrency) ?? {}; + const unavailableIdLists = { + activeTaskIds: activePreview.idsUnavailable === true, + databaseActiveTaskIds: databaseActivePreview.idsUnavailable === true, + queuedTaskIds: queuedPreview.idsUnavailable === true, + submittedTaskIds: submittedPreview.idsUnavailable === true, + }; return { total: record.total ?? null, queueCount: record.queueCount ?? null, @@ -4350,7 +4433,9 @@ function compactSubmitQueueConfirmation(value: unknown, options: CompactSubmitQu databaseActiveTaskCount, databaseActiveTaskIds: databaseActivePreview, queuedTaskIds: queuedPreview, - executionDiagnostics: compactQueueExecutionDiagnostics(record.executionDiagnostics), + activity, + commanderConcurrency, + executionDiagnostics: diagnostics, runnerPermissions: compactRunnerPermissions(record.runnerPermissions), ...(submittedTasks.length === 0 ? {} : { submittedTaskIds: submittedPreview }), countContext: { @@ -4368,10 +4453,22 @@ function compactSubmitQueueConfirmation(value: unknown, options: CompactSubmitQu countsAreAuthoritative: true, truncated: listsTruncated, omittedCounts, + unavailableIdLists, + emptyItemsSemantics: "items=[] is only emitted for an authoritative empty list or a nonempty bounded preview result; if count is nonzero and ids were not enumerated, the preview omits items and sets idsUnavailable=true.", note: listsTruncated ? "Low-noise mutation output omits additional task ids from one or more previews; use the raw command for full upstream queue detail." : "Low-noise mutation output includes all known task ids returned for these previews.", - rawCommand: "bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview?limit=30 --raw --full", + rawCommand: rawCodeQueueOverviewCommand, + }, + stateDisclosure: { + submittedStatusSource: "response.tasks[].status", + queueCountsSource: "response.queue.counts", + activeCountField: "queue.countContext.active", + commanderActiveRunnerCountField: "queue.activity.effectiveActiveTaskCount", + splitBrainLive: diagnostics?.splitBrainLive ?? false, + splitBrainDisposition: diagnostics?.splitBrainLive === true ? "live-counts-remain-active; continue supervision unless commanderConcurrency.interventionRequired=true" : "not-split-brain-live", + idsUnavailableMeaning: "A nonzero count with idsUnavailable=true means ids were omitted or unavailable in the bounded submit summary, not that there are no tasks.", + rawCommand: rawCodeQueueOverviewCommand, }, byQueue: Array.isArray(record.byQueue) ? record.byQueue : undefined, }; @@ -4392,6 +4489,8 @@ function compactSubmitSuccessResponse(body: Record, upstream: R const submittedTasks = rawTasks.map((task) => asRecord(task)).filter((task): task is Record => task !== null); const allTasks = rawTasks.map(compactSubmitTaskConfirmation); const tasks = allTasks.slice(0, defaultTasksLimit); + const allTaskStates = submittedTasks.map(submittedTaskState); + const taskStates = allTaskStates.slice(0, defaultTasksLimit); const allTaskIds = allTasks.map((task) => asString(task.id)).filter(Boolean); const taskIds = allTaskIds.slice(0, defaultTasksLimit); const queueIds = Array.from(new Set(tasks.map((task) => asString(task.queueId)).filter(Boolean))).sort(); @@ -4413,6 +4512,10 @@ function compactSubmitSuccessResponse(body: Record, upstream: R taskIdsCount: allTaskIds.length, taskIdsTruncated: allTaskIds.length > taskIds.length, queueIds, + statusCounts: submittedTaskStatusCounts(submittedTasks), + taskStates, + taskStatesTruncated: allTaskStates.length > taskStates.length, + stateSource: "response.tasks[].status is authoritative for the submitted task state at submit-confirmation time.", tasks, tasksTruncated: allTasks.length > tasks.length, promptOmitted: true, diff --git a/scripts/src/help.ts b/scripts/src/help.ts index 81d7ae2c..0fe14c48 100644 --- a/scripts/src/help.ts +++ b/scripts/src/help.ts @@ -281,6 +281,12 @@ function codexHelp(): unknown { boundary: "--execution-mode selects Code Queue runtime placement, not Codex sandbox permissions.", permissionVisibility: "Submit dry-runs show requested/effective mode; real submit responses include service runnerPermissions.sandbox and approvalPolicy.", }, + submitSummary: { + default: "Real codex submit is a compact write confirmation: accepted status, submitted task ids, submitted status/source, queue ids, promptOmitted=true, queue counts, activity and drill-down commands.", + listSemantics: "Task id previews are bounded objects; if a count is nonzero but ids are omitted or unavailable, the field sets idsUnavailable=true and does not print items=[] as if the list were empty.", + splitBrainLive: "When split-brain-live heartbeat evidence is present, submit summaries keep the live disposition visible and continue to count heartbeat/database active tasks as active.", + rawDrillDown: "Use queue.listPreviewPolicy.rawCommand or bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview?limit=30 --raw --full for raw queue detail.", + }, readOutput: { default: "codex read marks a terminal task read and returns terminal metadata, final response, last error/judge, counts, and drill-down commands.", disclosure: "Full prompt, tool logs, and feedback prompts are not printed by codex read; use codex task/detail/trace/output for progressive disclosure.",