From ce031238f1d6503e67d01e27f5a6075fe231cda6 Mon Sep 17 00:00:00 2001 From: Codex Date: Tue, 2 Jun 2026 21:17:56 +0800 Subject: [PATCH] =?UTF-8?q?fix:=20=E4=BF=9D=E7=95=99=E9=95=BF=E4=BB=BB?= =?UTF-8?q?=E5=8A=A1=E8=BF=87=E7=A8=8B=20trace=20=E4=BA=8B=E4=BB=B6?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- docs/reference/spec-v01-backend-adapter.md | 8 +-- src/backend/codex-stdio.ts | 79 ++++++++++++++++++++-- src/selftest/cases/30-codex-stdio.ts | 46 +++++++++---- src/selftest/fake-codex-app-server.ts | 16 +++++ 4 files changed, 125 insertions(+), 24 deletions(-) diff --git a/docs/reference/spec-v01-backend-adapter.md b/docs/reference/spec-v01-backend-adapter.md index 44ac4ec..32e2273 100644 --- a/docs/reference/spec-v01-backend-adapter.md +++ b/docs/reference/spec-v01-backend-adapter.md @@ -38,7 +38,7 @@ Backend adapter 的第一阶段实现应吸收 HWLAB v0.2 已验证的 Codex std | --- | --- | --- | | Codex app-server JSON-RPC stdio | `internal/cloud/codex-stdio-session.ts`、`internal/cloud/codex-stdio-session-turn-state.ts` | 支持 `initialize`、`thread/start`、`thread/resume`、`turn/start`,并处理 app-server client request;未知请求要记录 unsupported error,不能静默等待。 | | completed 判定 | `docs/reference/code-agent-chat-readiness.md` | 只有 Codex turn terminal completed 且 assistant reply 可聚合时才输出 completed;assistant delta、item completed、stdout 或 transport close 不能单独完成。 | -| assistant stream 和 trace | `internal/cloud/code-agent-trace-store.ts`、`internal/cloud/codex-stdio-session-turn-state.ts` | assistant delta 只能作为 stream/progress 证据;每个非空 completed `agentMessage` item 必须输出一个 `assistant_message` event,保留 `itemId` 和顺序;`item/agentMessage:started`、`item/agentMessage:completed` 这类 lifecycle 不得额外持久化为 `backend_status`,避免同一消息在 Web/CLI trace 中重复渲染;最终 result reply 必须优先来自最后一个 completed `agentMessage` item,不能把 commentary/progress delta 与 final response 直接串接。event 必须保留 `threadId`、`turnId`、session 摘要和 redacted backend metadata。 | +| assistant stream 和 trace | `internal/cloud/code-agent-trace-store.ts`、`internal/cloud/codex-stdio-session-turn-state.ts` | assistant delta 只能作为 stream/progress 证据;长输出过程中可以输出有界 `assistant_message.source=agent-message-delta-progress` 快照,但 `replyAuthority=false` 且不得参与最终 reply 聚合;每个非空 completed `agentMessage` item 必须输出一个 `assistant_message` event,保留 `itemId` 和顺序;`item/agentMessage:started`、`item/agentMessage:completed` 这类 lifecycle 不得额外持久化为 `backend_status`,避免同一消息在 Web/CLI trace 中重复渲染;最终 result reply 必须优先来自最后一个 completed `agentMessage` item,不能把 commentary/progress delta 与 final response 直接串接。event 必须保留 `threadId`、`turnId`、session 摘要和 redacted backend metadata。 | | command/tool output bounded | `docs/reference/code-agent-chat-readiness.md`、`web/hwlab-cloud-web/app-trace.ts` | `tool_call` 和 `command_output` 必须记录状态、摘要、字节数、截断标记;完整大输出只能通过后续 log/artifact 引用。 | | provider/profile 隔离 | `internal/cloud/code-agent-contract.ts` | `codex`、`deepseek` 与 `minimax-m3` 共享同一 backend kind,但必须使用 profile-scoped SecretRef、model/base-url/config 和 writable runtime home。 | | Secret redaction | `internal/cloud/code-agent-trace-store.ts` | `OPENAI_API_KEY`、auth/config、token、password、kubeconfig、URL credential 不得进入 event、result、log 或 health。 | @@ -60,7 +60,7 @@ Registry 只表达能力和选择边界,不读取 Secret 值。Manager 负责 Adapter 输出给 runner 的 event 类型至少包括: - `backend_status`:backend 启动、模型/profile、能力和阶段状态,不包含 Secret 值。 -- `assistant_message`:模型输出的用户可见 assistant 文本。Codex app-server 的 `item/agentMessage/delta` 只能作为流式过程证据或缺少 completed item 时的兜底;一旦收到 completed `agentMessage` item,adapter 必须为每个非空 completed item 输出一条 `assistant_message`,并用 `itemId`、`messageIndex`、`messageCount`、`replyAuthority` 和 `final` 标明顺序与最终 reply authority。最终 result reply 必须以最后一个 `replyAuthority=true` / `final=true` 的 `assistant_message` 为准,避免把 commentary/status/progress 堆入 final response。 +- `assistant_message`:模型输出的用户可见 assistant 文本。Codex app-server 的 `item/agentMessage/delta` 只能作为流式过程证据或缺少 completed item 时的兜底;adapter 可以为长 delta 输出有界 progress 快照,必须标记 `source=agent-message-delta-progress`、`progress=true`、`replyAuthority=false` 和 `final=false`。一旦收到 completed `agentMessage` item,adapter 必须为每个非空 completed item 输出一条 `assistant_message`,并用 `itemId`、`messageIndex`、`messageCount`、`replyAuthority` 和 `final` 标明顺序与最终 reply authority。最终 result reply 必须以最后一个 `replyAuthority=true` / `final=true` 的 `assistant_message` 为准,避免把 commentary/status/progress 堆入 final response。 - `tool_call`:工具调用摘要和 redacted 参数。 - `command_output`:stdout/stderr 或命令输出摘要。 - `diff`:代码变更摘要或 patch 片段;必须受长度限制。 @@ -69,9 +69,9 @@ Adapter 输出给 runner 的 event 类型至少包括: 事件必须有上限和分页友好形态。大型日志、完整 stdout 或完整 trace 应进入 logPath 或后续 artifact,不得一次性塞入单个 event 造成输出爆炸。 -Codex app-server 的低价值内部 notification 必须在 AgentRun adapter 层收敛,不得要求 HWLAB Web/CLI 或其他消费侧自行过滤。以下事件默认不作为 durable trace event 持久化:`item/reasoning/textDelta`、纯 `reasoning` item 的 `item/started|item/completed`、非 `commandExecution` item 的通用 `item/started|item/completed`、`thread/tokenUsage/updated`、`account/rateLimits/updated`、普通 `warning` 和 `configWarning`。adapter 可以输出一条有界 `backend_status.phase=codex-app-server-notifications-suppressed` 摘要,只包含总数、`methods: [{ method, count }]` 和 `itemTypes: [{ itemType, count }]`,不包含 reasoning 文本、Secret、token 或 env value。method 和 item type 不得作为 JSON object key 输出,避免 `thread/tokenUsage/updated` 这类协议名被 redaction 误判为敏感 key。真实 `agentMessage`、`commandExecution`、`command_output`、error、terminal 和关键生命周期事件必须继续保留。 +Codex app-server 的低价值内部 notification 必须在 AgentRun adapter 层收敛,不得要求 HWLAB Web/CLI 或其他消费侧自行过滤。以下事件默认不作为 durable trace event 持久化:`item/reasoning/textDelta`、纯 `reasoning` item 的 `item/started|item/completed`、非用户可见工具 item 的通用 `item/started|item/completed`、`thread/tokenUsage/updated`、`account/rateLimits/updated`、普通 `warning` 和 `configWarning`。adapter 可以输出一条有界 `backend_status.phase=codex-app-server-notifications-suppressed` 摘要,只包含总数、`methods: [{ method, count }]` 和 `itemTypes: [{ itemType, count }]`,不包含 reasoning 文本、Secret、token 或 env value。method 和 item type 不得作为 JSON object key 输出,避免 `thread/tokenUsage/updated` 这类协议名被 redaction 误判为敏感 key。真实 `agentMessage`、`commandExecution`、`webSearch`、`command_output`、error、terminal 和关键生命周期事件必须继续保留。 -`commandExecution` 的 `tool_call` event 只能输出面向人和消费侧的扁平字段,例如 `method`、`itemId`、`toolName`、`type`、`command`、`cwd`、`status`、`processId` 和 `valuesPrinted=false`。不得把 Codex app-server 的原始 `item` JSON、`itemPreview` 或嵌套协议摘要写入 `message`、`outputSummary`、`stdoutSummary` 或 payload;命令实际 stdout/stderr 只通过 `command_output` 或 completed `commandExecution` 摘要输出。 +用户可见工具生命周期的 `tool_call` event 只能输出面向人和消费侧的扁平字段,例如 `method`、`itemId`、`toolName`、`type`、`command`、`cwd`、`status`、`processId` 和 `valuesPrinted=false`。当前可见工具类型包括 `commandExecution` 和 `webSearch`;不得把 Codex app-server 的原始 `item` JSON、`itemPreview` 或嵌套协议摘要写入 `message`、`outputSummary`、`stdoutSummary` 或 payload;命令实际 stdout/stderr 只通过 `command_output` 或 completed `commandExecution` 摘要输出。 ## Failure Mapping diff --git a/src/backend/codex-stdio.ts b/src/backend/codex-stdio.ts index c89db65..0e84123 100644 --- a/src/backend/codex-stdio.ts +++ b/src/backend/codex-stdio.ts @@ -14,6 +14,8 @@ const defaultCodexArgs = ["app-server", "--listen", "stdio://"]; const stderrBufferBytes = 64_000; const stderrEventChars = 4_000; const requestTimeoutCapMs = 30_000; +const assistantDeltaProgressMinChars = 500; +const assistantDeltaProgressLimitChars = 1_200; const childEnvSummaryKeys = [ "CODEX_HOME", @@ -73,12 +75,21 @@ interface CompletedAssistantMessage { text: string; } +interface AssistantDeltaProgressItem { + itemId: string | null; + text: string; + emittedChars: number; + flushed: boolean; +} + interface SuppressedNotificationSummary { total: number; byMethod: Record; byItemType: Record; } +type AssistantDeltaProgressState = Map; + interface CodexStdioCloseInfo extends JsonRecord { code: number | null; signal: string | null; @@ -398,6 +409,7 @@ async function runCodexStdioTurnWithSession(options: CodexStdioTurnOptions, sess return { terminalStatus: cancelled.status, failureKind: cancelled.failureKind, failureMessage: cancelled.message, events: events.map((event) => ({ ...event, payload: redactJson(event.payload) })) }; } let assistantText = ""; + const assistantDeltaProgress = createAssistantDeltaProgressState(); const completedAssistantMessages: CompletedAssistantMessage[] = []; const suppressedNotifications = createSuppressedNotificationSummary(); let threadId: string | undefined = options.threadId; @@ -428,7 +440,11 @@ async function runCodexStdioTurnWithSession(options: CodexStdioTurnOptions, sess if (normalized.threadId) threadId = normalized.threadId; if (normalized.turnId) turnId = normalized.turnId; emitEvents(normalized.events); - if (normalized.assistantDelta) assistantText += normalized.assistantDelta; + if (normalized.assistantDelta) { + assistantText += normalized.assistantDelta.text; + const progress = recordAssistantDeltaProgress(assistantDeltaProgress, normalized.assistantDelta); + if (progress) emitEvent(progress); + } if (normalized.completedAssistantMessage) { completedAssistantMessages.push(normalized.completedAssistantMessage); emitEvent(assistantMessageEventForCompleted(normalized.completedAssistantMessage, completedAssistantMessages.length)); @@ -504,6 +520,7 @@ async function runCodexStdioTurnWithSession(options: CodexStdioTurnOptions, sess } if (!terminal) terminal = { status: "failed", failureKind: "backend-response-invalid", message: "codex app-server finished without terminal status" }; if (terminal.status !== "completed") emitEvents(await session.close()); + emitEvents(flushAssistantDeltaProgress(assistantDeltaProgress)); if (completedAssistantMessages.length === 0) emitEvents(assistantMessageEventsForTurn(assistantText, terminal.status === "completed")); emitEvents(suppressedNotificationEvents(suppressedNotifications)); emitEvent({ type: "terminal_status", payload: { terminalStatus: terminal.status, failureKind: terminal.failureKind, message: terminal.message } }); @@ -567,7 +584,7 @@ function codexHomeReadiness(codexHome: string): BackendTurnResult | null { }; } -function normalizeCodexNotification(message: JsonRecord, suppressed: SuppressedNotificationSummary): { events: BackendEvent[]; assistantDelta?: string; completedAssistantMessage?: CompletedAssistantMessage; threadId?: string; turnId?: string; terminal?: { status: TerminalStatus; failureKind: FailureKind | null; message: string | null } } { +function normalizeCodexNotification(message: JsonRecord, suppressed: SuppressedNotificationSummary): { events: BackendEvent[]; assistantDelta?: { itemId: string | null; text: string }; completedAssistantMessage?: CompletedAssistantMessage; threadId?: string; turnId?: string; terminal?: { status: TerminalStatus; failureKind: FailureKind | null; message: string | null } } { const method = typeof message.method === "string" ? message.method : "unknown"; const params = asRecordAt(message, "params"); if (method === "thread/started") { @@ -582,7 +599,7 @@ function normalizeCodexNotification(message: JsonRecord, suppressed: SuppressedN recordSuppressedNotification(suppressed, method); return { events: [] }; } - if (method === "item/agentMessage/delta") return { events: [], assistantDelta: typeof params.delta === "string" ? params.delta : "" }; + if (method === "item/agentMessage/delta") return { events: [], assistantDelta: { itemId: stringAt(params, "itemId"), text: typeof params.delta === "string" ? params.delta : "" } }; if (method === "item/commandExecution/outputDelta") return { events: [{ type: "command_output", payload: commandOutputPayload("stdout", typeof params.delta === "string" ? params.delta : "") }] }; if (method === "item/reasoning/textDelta") { recordSuppressedNotification(suppressed, method, "reasoning"); @@ -601,7 +618,7 @@ function normalizeCodexNotification(message: JsonRecord, suppressed: SuppressedN if (method === "item/started" || method === "item/completed") { const item = asRecordAt(params, "item"); const itemType = typeof item.type === "string" ? item.type : "unknown"; - if (itemType !== "commandExecution" || isSuppressedCodexItemType(itemType)) { + if (!isVisibleCodexToolItemType(itemType)) { recordSuppressedNotification(suppressed, method, itemType); return { events: [] }; } @@ -664,8 +681,8 @@ function isSuppressedCodexStatusNotification(method: string): boolean { return method === "thread/tokenUsage/updated" || method === "account/rateLimits/updated" || method === "warning" || method === "configWarning"; } -function isSuppressedCodexItemType(itemType: string): boolean { - return itemType === "reasoning"; +function isVisibleCodexToolItemType(itemType: string): boolean { + return itemType === "commandExecution" || itemType === "webSearch"; } function assistantMessageEventForCompleted(message: CompletedAssistantMessage, messageIndex: number): BackendEvent { @@ -699,6 +716,56 @@ function assistantMessageEventsForTurn(assistantDeltaText: string, completed: bo }]; } +function createAssistantDeltaProgressState(): AssistantDeltaProgressState { + return new Map(); +} + +function recordAssistantDeltaProgress(state: AssistantDeltaProgressState, delta: { itemId: string | null; text: string }): BackendEvent | null { + if (!delta.text) return null; + const key = delta.itemId ?? "default"; + const current = state.get(key) ?? { itemId: delta.itemId, text: "", emittedChars: 0, flushed: false }; + current.text += delta.text; + current.flushed = false; + state.set(key, current); + if (current.text.length - current.emittedChars < assistantDeltaProgressMinChars) return null; + current.emittedChars = current.text.length; + return assistantDeltaProgressEvent(current, false); +} + +function flushAssistantDeltaProgress(state: AssistantDeltaProgressState): BackendEvent[] { + const events: BackendEvent[] = []; + for (const item of state.values()) { + if (item.flushed || item.text.trim().length === 0 || item.text.length === item.emittedChars) continue; + item.emittedChars = item.text.length; + item.flushed = true; + events.push(assistantDeltaProgressEvent(item, true)); + } + return events; +} + +function assistantDeltaProgressEvent(item: AssistantDeltaProgressItem, flush: boolean): BackendEvent { + const summary = boundedTextSummary(item.text.trim(), { limitChars: assistantDeltaProgressLimitChars }); + return { + type: "assistant_message", + payload: { + text: summary.text, + itemId: item.itemId, + source: "agent-message-delta-progress", + messageIndex: null, + messageCount: null, + replyAuthority: false, + final: false, + progress: true, + progressFlush: flush, + textBytes: summary.textBytes, + textTruncated: summary.textTruncated, + outputBytes: summary.outputBytes, + outputTruncated: summary.outputTruncated, + valuesPrinted: false, + }, + }; +} + function terminalStatusFromValue(value: unknown): TerminalStatus { if (value === "completed") return "completed"; if (value === "cancelled" || value === "canceled" || value === "interrupted") return "cancelled"; diff --git a/src/selftest/cases/30-codex-stdio.ts b/src/selftest/cases/30-codex-stdio.ts index 79a57cf..a989261 100644 --- a/src/selftest/cases/30-codex-stdio.ts +++ b/src/selftest/cases/30-codex-stdio.ts @@ -79,13 +79,14 @@ const selfTest: SelfTestCase = async (context) => { assert.equal(finalMessageEnvelope.reply, "Final answer only.", "result reply should use the final completed agentMessage instead of concatenating progress deltas"); const finalMessageEvents = await client.get(`/api/v1/runs/${finalMessage.runId}/events?afterSeq=0&limit=100`) as { items?: Array<{ type: string; payload: unknown }> }; const assistantEvents = finalMessageEvents.items?.filter((event) => event.type === "assistant_message") ?? []; - assert.equal(assistantEvents.length, 2, "backend should preserve each completed agentMessage as assistant_message event"); - assert.equal(eventPayload(assistantEvents[0] ?? { payload: {} }).text, "I am checking the workspace."); - assert.equal(eventPayload(assistantEvents[0] ?? { payload: {} }).itemId, "msg_progress"); - assert.equal(eventPayload(assistantEvents[0] ?? { payload: {} }).replyAuthority, false); - assert.equal(eventPayload(assistantEvents[1] ?? { payload: {} }).text, "Final answer only."); - assert.equal(eventPayload(assistantEvents[1] ?? { payload: {} }).itemId, "msg_final"); - assert.equal(eventPayload(assistantEvents[1] ?? { payload: {} }).replyAuthority, false); + const completedAssistantEvents = assistantEvents.filter((event) => eventPayload(event).source === "completed-agent-message"); + assert.equal(completedAssistantEvents.length, 2, "backend should preserve each completed agentMessage as assistant_message event"); + assert.equal(eventPayload(completedAssistantEvents[0] ?? { payload: {} }).text, "I am checking the workspace."); + assert.equal(eventPayload(completedAssistantEvents[0] ?? { payload: {} }).itemId, "msg_progress"); + assert.equal(eventPayload(completedAssistantEvents[0] ?? { payload: {} }).replyAuthority, false); + assert.equal(eventPayload(completedAssistantEvents[1] ?? { payload: {} }).text, "Final answer only."); + assert.equal(eventPayload(completedAssistantEvents[1] ?? { payload: {} }).itemId, "msg_final"); + assert.equal(eventPayload(completedAssistantEvents[1] ?? { payload: {} }).replyAuthority, false); const finalMessageItems = finalMessageEvents.items ?? []; const progressMessageIndex = finalMessageItems.findIndex((event) => event.type === "assistant_message" && eventPayload(event).itemId === "msg_progress"); const finalMessageIndex = finalMessageItems.findIndex((event) => event.type === "assistant_message" && eventPayload(event).itemId === "msg_final"); @@ -94,6 +95,28 @@ const selfTest: SelfTestCase = async (context) => { assert.ok(finalMessageIndex >= 0 && finalMessageIndex < turnCompletedIndex, "final agentMessage should be emitted before turn/completed instead of being delayed to final response"); assert.equal(finalMessageItems.some((event) => event.type === "backend_status" && String(eventPayload(event).phase ?? "").startsWith("item/agentMessage:")), false, "agentMessage lifecycle must not be persisted as backend_status noise"); + const webSearch = await createRunWithCommand(client, context, "hello web search progress", "selftest-web-search-progress", 15_000); + const webSearchPromise = runOnce({ managerUrl: server.baseUrl, runId: webSearch.runId, codexCommand: context.fakeCodexCommand, codexArgs: context.fakeCodexArgs, codexHome: context.codexHome, env: { CODEX_HOME: context.codexHome, AGENTRUN_FAKE_CODEX_MODE: "web-search-progress" }, oneShot: true }) as Promise; + await waitForEvent(client, webSearch.runId, (event) => event.type === "tool_call" && eventPayload(event).type === "webSearch" && eventPayload(event).method === "item/started", "webSearch tool_call start event"); + await waitForEvent(client, webSearch.runId, (event) => event.type === "assistant_message" && eventPayload(event).source === "agent-message-delta-progress", "assistant delta progress event"); + const webSearchResult = await webSearchPromise; + assert.equal(webSearchResult.terminalStatus, "completed", "web search progress turn should complete"); + const webSearchEnvelope = await client.get(`/api/v1/runs/${webSearch.runId}/commands/${webSearch.commandId}/result`) as JsonRecord; + assert.equal(webSearchEnvelope.reply, "Final IAM recommendation.", "result reply should ignore live delta progress snapshots"); + const webSearchEvents = await client.get(`/api/v1/runs/${webSearch.runId}/events?afterSeq=0&limit=100`) as { items?: Array<{ type: string; payload: unknown }> }; + const webSearchItems = webSearchEvents.items ?? []; + assert.ok(webSearchItems.some((event) => event.type === "tool_call" && eventPayload(event).type === "webSearch" && eventPayload(event).method === "item/completed"), "webSearch completion must remain visible as a tool_call"); + assert.ok(webSearchItems.some((event) => event.type === "assistant_message" && eventPayload(event).source === "agent-message-delta-progress" && eventPayload(event).progress === true), "assistant delta progress must be visible before final reply"); + const webSearchStartIndex = webSearchItems.findIndex((event) => event.type === "tool_call" && eventPayload(event).type === "webSearch" && eventPayload(event).method === "item/started"); + const webSearchProgressIndex = webSearchItems.findIndex((event) => event.type === "assistant_message" && eventPayload(event).source === "agent-message-delta-progress"); + const webSearchCompletedIndex = webSearchItems.findIndex((event) => event.type === "tool_call" && eventPayload(event).type === "webSearch" && eventPayload(event).method === "item/completed"); + const webSearchFinalIndex = webSearchItems.findIndex((event) => event.type === "assistant_message" && eventPayload(event).source === "completed-agent-message" && eventPayload(event).itemId === "msg_search"); + assert.ok(webSearchStartIndex >= 0 && webSearchStartIndex < webSearchProgressIndex, "webSearch start should be visible before assistant progress"); + assert.ok(webSearchProgressIndex >= 0 && webSearchProgressIndex < webSearchCompletedIndex, "assistant progress should be visible while webSearch is still running"); + assert.ok(webSearchCompletedIndex >= 0 && webSearchCompletedIndex < webSearchFinalIndex, "webSearch completion should be visible before final assistant reply"); + assert.equal(webSearchItems.some((event) => event.type === "tool_call" && eventPayload(event).type === "reasoning"), false, "reasoning items must still not be persisted as tool_call"); + assertNoSecretLeak(webSearchEvents); + const staleThread = await createStaleThreadRun(client, context); const staleThreadResult = await runOnce({ managerUrl: server.baseUrl, @@ -148,7 +171,7 @@ const selfTest: SelfTestCase = async (context) => { assert.equal(noisyItems.some((event) => event.type === "backend_status" && eventPayload(event).phase === "configWarning"), false, "low value config warnings must not be persisted as backend_status"); assert.equal(noisyItems.some((event) => event.type === "tool_call" && eventPayload(event).type === "reasoning"), false, "reasoning items must not be persisted as tool_call"); assert.ok(noisyItems.some((event) => event.type === "tool_call" && eventPayload(event).method === "item/started" && eventPayload(event).type === "commandExecution"), "real commandExecution tool call should remain visible"); - assert.equal(noisyItems.some((event) => event.type === "tool_call" && eventPayload(event).type !== "commandExecution"), false, "non-commandExecution item lifecycle must not be persisted as tool_call"); + assert.equal(noisyItems.some((event) => event.type === "tool_call" && eventPayload(event).type !== "commandExecution" && eventPayload(event).type !== "webSearch"), false, "only user-visible tool lifecycle items should be persisted as tool_call"); assert.equal(noisyItems.some((event) => event.type === "backend_status" && String(eventPayload(event).phase ?? "").startsWith("item/agentMessage:")), false, "agentMessage lifecycle must not be persisted as backend_status noise"); assert.equal(noisyPhases.includes("backend-turn-running"), false, "backend progress ticks must be summarized instead of persisted as durable trace events"); const noisyFinished = noisyItems.find((event) => event.type === "backend_status" && eventPayload(event).phase === "backend-turn-finished"); @@ -186,7 +209,7 @@ const selfTest: SelfTestCase = async (context) => { await runSecretFailureCase({ client, managerUrl: server.baseUrl, context }); await runSpawnFailureCase({ client, managerUrl: server.baseUrl, context }); - return { name: "codex-stdio", tests: ["runner-lease-heartbeat", "codex-stdio-fake-turn", "codex-stdio-projected-writable-home", "codex-stdio-deepseek-profile-fake-turn", "codex-stdio-minimax-m3-profile-fake-turn", "codex-stdio-deepseek-missing-secret-no-fallback", "codex-stdio-minimax-m3-missing-secret-no-fallback", "codex-stdio-config-model-authoritative", "codex-stdio-explicit-model-forwarded", "codex-stdio-final-agent-message-only", "codex-stdio-stale-thread-resume-failed", "codex-stdio-live-tool-events", "codex-stdio-noisy-reasoning-suppression", "codex-stdio-missing-turn-result", "codex-stdio-provider-auth-failed", "codex-stdio-provider-rate-limited", "codex-stdio-provider-invalid-tool-call", "codex-stdio-provider-503-rpc-error", "codex-stdio-provider-503-terminal", "codex-stdio-provider-503-retry-event", "codex-stdio-invalid-json", "codex-stdio-timeout", "codex-stdio-secret-unavailable", "codex-stdio-spawn-failure"] }; + return { name: "codex-stdio", tests: ["runner-lease-heartbeat", "codex-stdio-fake-turn", "codex-stdio-projected-writable-home", "codex-stdio-deepseek-profile-fake-turn", "codex-stdio-minimax-m3-profile-fake-turn", "codex-stdio-deepseek-missing-secret-no-fallback", "codex-stdio-minimax-m3-missing-secret-no-fallback", "codex-stdio-config-model-authoritative", "codex-stdio-explicit-model-forwarded", "codex-stdio-final-agent-message-only", "codex-stdio-web-search-progress", "codex-stdio-stale-thread-resume-failed", "codex-stdio-live-tool-events", "codex-stdio-noisy-reasoning-suppression", "codex-stdio-missing-turn-result", "codex-stdio-provider-auth-failed", "codex-stdio-provider-rate-limited", "codex-stdio-provider-invalid-tool-call", "codex-stdio-provider-503-rpc-error", "codex-stdio-provider-503-terminal", "codex-stdio-provider-503-retry-event", "codex-stdio-invalid-json", "codex-stdio-timeout", "codex-stdio-secret-unavailable", "codex-stdio-spawn-failure"] }; } finally { await new Promise((resolve) => server.server.close(() => resolve())); } @@ -234,11 +257,6 @@ function countEntriesByName(value: unknown, keyName: "method" | "itemType"): Rec return output; } -function eventPayloadItem(event: { payload: unknown }): JsonRecord { - const item = eventPayload(event).item; - return typeof item === "object" && item !== null && !Array.isArray(item) ? item as JsonRecord : {}; -} - async function waitForEvent(client: ManagerClient, runId: string, predicate: (event: { type: string; payload: unknown }) => boolean, label: string): Promise { const deadline = Date.now() + 3_000; while (Date.now() < deadline) { diff --git a/src/selftest/fake-codex-app-server.ts b/src/selftest/fake-codex-app-server.ts index be7fd78..292e2bb 100644 --- a/src/selftest/fake-codex-app-server.ts +++ b/src/selftest/fake-codex-app-server.ts @@ -151,6 +151,22 @@ for await (const line of rl) { respond(message.id, { turn }); continue; } + if (mode === "web-search-progress") { + turnCounter += 1; + const turn = { id: `turn_selftest_${turnCounter}`, status: "completed" }; + notify("turn/started", { turn }); + notify("item/started", { item: { id: "search_selftest", type: "webSearch", status: "running" } }); + notify("item/agentMessage/delta", { itemId: "msg_search", delta: "I am checking Kubernetes identity components and deployment docs. " }); + notify("item/agentMessage/delta", { itemId: "msg_search", delta: "Keycloak, ZITADEL, authentik, Ory, Dex, OpenFGA, and SpiceDB are being compared for lifecycle and authorization coverage. " }); + notify("item/agentMessage/delta", { itemId: "msg_search", delta: "Gateway/IAP choices are being separated from IdP and fine-grained authorization so the result can recommend a layered architecture. " }); + notify("item/agentMessage/delta", { itemId: "msg_search", delta: "This long progress text intentionally crosses the AgentRun live progress threshold before the final completed agentMessage is emitted. " }); + notify("item/agentMessage/delta", { itemId: "msg_search", delta: "The visible trace should therefore show work in progress while web search is still running, not only after turn completion. " }); + notify("item/completed", { item: { id: "search_selftest", type: "webSearch", status: "completed", outputSummary: "searched Kubernetes IAM and gateway auth options" } }); + notify("item/completed", { item: { id: "msg_search", type: "agentMessage", text: "Final IAM recommendation." } }); + notify("turn/completed", { turn }); + respond(message.id, { turn }); + continue; + } if (mode === "slow-tool-events") { turnCounter += 1; const turn = { id: `turn_selftest_${turnCounter}`, status: "completed" };