Merge pull request #80 from pikasTech/fix/v01-assistant-delta-progress-712
修复长任务中间过程 trace 可见性
This commit is contained in:
@@ -38,7 +38,7 @@ Backend adapter 的第一阶段实现应吸收 HWLAB v0.2 已验证的 Codex std
|
||||
| --- | --- | --- |
|
||||
| Codex app-server JSON-RPC stdio | `internal/cloud/codex-stdio-session.ts`、`internal/cloud/codex-stdio-session-turn-state.ts` | 支持 `initialize`、`thread/start`、`thread/resume`、`turn/start`,并处理 app-server client request;未知请求要记录 unsupported error,不能静默等待。 |
|
||||
| completed 判定 | `docs/reference/code-agent-chat-readiness.md` | 只有 Codex turn terminal completed 且 assistant reply 可聚合时才输出 completed;assistant delta、item completed、stdout 或 transport close 不能单独完成。 |
|
||||
| assistant stream 和 trace | `internal/cloud/code-agent-trace-store.ts`、`internal/cloud/codex-stdio-session-turn-state.ts` | assistant delta 只能作为 stream/progress 证据;每个非空 completed `agentMessage` item 必须输出一个 `assistant_message` event,保留 `itemId` 和顺序;`item/agentMessage:started`、`item/agentMessage:completed` 这类 lifecycle 不得额外持久化为 `backend_status`,避免同一消息在 Web/CLI trace 中重复渲染;最终 result reply 必须优先来自最后一个 completed `agentMessage` item,不能把 commentary/progress delta 与 final response 直接串接。event 必须保留 `threadId`、`turnId`、session 摘要和 redacted backend metadata。 |
|
||||
| assistant stream 和 trace | `internal/cloud/code-agent-trace-store.ts`、`internal/cloud/codex-stdio-session-turn-state.ts` | assistant delta 只能作为 stream/progress 证据;长输出过程中可以输出有界 `assistant_message.source=agent-message-delta-progress` 快照,但 `replyAuthority=false` 且不得参与最终 reply 聚合;每个非空 completed `agentMessage` item 必须输出一个 `assistant_message` event,保留 `itemId` 和顺序;`item/agentMessage:started`、`item/agentMessage:completed` 这类 lifecycle 不得额外持久化为 `backend_status`,避免同一消息在 Web/CLI trace 中重复渲染;最终 result reply 必须优先来自最后一个 completed `agentMessage` item,不能把 commentary/progress delta 与 final response 直接串接。event 必须保留 `threadId`、`turnId`、session 摘要和 redacted backend metadata。 |
|
||||
| command/tool output bounded | `docs/reference/code-agent-chat-readiness.md`、`web/hwlab-cloud-web/app-trace.ts` | `tool_call` 和 `command_output` 必须记录状态、摘要、字节数、截断标记;完整大输出只能通过后续 log/artifact 引用。 |
|
||||
| provider/profile 隔离 | `internal/cloud/code-agent-contract.ts` | `codex`、`deepseek` 与 `minimax-m3` 共享同一 backend kind,但必须使用 profile-scoped SecretRef、model/base-url/config 和 writable runtime home。 |
|
||||
| Secret redaction | `internal/cloud/code-agent-trace-store.ts` | `OPENAI_API_KEY`、auth/config、token、password、kubeconfig、URL credential 不得进入 event、result、log 或 health。 |
|
||||
@@ -60,7 +60,7 @@ Registry 只表达能力和选择边界,不读取 Secret 值。Manager 负责
|
||||
Adapter 输出给 runner 的 event 类型至少包括:
|
||||
|
||||
- `backend_status`:backend 启动、模型/profile、能力和阶段状态,不包含 Secret 值。
|
||||
- `assistant_message`:模型输出的用户可见 assistant 文本。Codex app-server 的 `item/agentMessage/delta` 只能作为流式过程证据或缺少 completed item 时的兜底;一旦收到 completed `agentMessage` item,adapter 必须为每个非空 completed item 输出一条 `assistant_message`,并用 `itemId`、`messageIndex`、`messageCount`、`replyAuthority` 和 `final` 标明顺序与最终 reply authority。最终 result reply 必须以最后一个 `replyAuthority=true` / `final=true` 的 `assistant_message` 为准,避免把 commentary/status/progress 堆入 final response。
|
||||
- `assistant_message`:模型输出的用户可见 assistant 文本。Codex app-server 的 `item/agentMessage/delta` 只能作为流式过程证据或缺少 completed item 时的兜底;adapter 可以为长 delta 输出有界 progress 快照,必须标记 `source=agent-message-delta-progress`、`progress=true`、`replyAuthority=false` 和 `final=false`。一旦收到 completed `agentMessage` item,adapter 必须为每个非空 completed item 输出一条 `assistant_message`,并用 `itemId`、`messageIndex`、`messageCount`、`replyAuthority` 和 `final` 标明顺序与最终 reply authority。最终 result reply 必须以最后一个 `replyAuthority=true` / `final=true` 的 `assistant_message` 为准,避免把 commentary/status/progress 堆入 final response。
|
||||
- `tool_call`:工具调用摘要和 redacted 参数。
|
||||
- `command_output`:stdout/stderr 或命令输出摘要。
|
||||
- `diff`:代码变更摘要或 patch 片段;必须受长度限制。
|
||||
@@ -69,9 +69,9 @@ Adapter 输出给 runner 的 event 类型至少包括:
|
||||
|
||||
事件必须有上限和分页友好形态。大型日志、完整 stdout 或完整 trace 应进入 logPath 或后续 artifact,不得一次性塞入单个 event 造成输出爆炸。
|
||||
|
||||
Codex app-server 的低价值内部 notification 必须在 AgentRun adapter 层收敛,不得要求 HWLAB Web/CLI 或其他消费侧自行过滤。以下事件默认不作为 durable trace event 持久化:`item/reasoning/textDelta`、纯 `reasoning` item 的 `item/started|item/completed`、非 `commandExecution` item 的通用 `item/started|item/completed`、`thread/tokenUsage/updated`、`account/rateLimits/updated`、普通 `warning` 和 `configWarning`。adapter 可以输出一条有界 `backend_status.phase=codex-app-server-notifications-suppressed` 摘要,只包含总数、`methods: [{ method, count }]` 和 `itemTypes: [{ itemType, count }]`,不包含 reasoning 文本、Secret、token 或 env value。method 和 item type 不得作为 JSON object key 输出,避免 `thread/tokenUsage/updated` 这类协议名被 redaction 误判为敏感 key。真实 `agentMessage`、`commandExecution`、`command_output`、error、terminal 和关键生命周期事件必须继续保留。
|
||||
Codex app-server 的低价值内部 notification 必须在 AgentRun adapter 层收敛,不得要求 HWLAB Web/CLI 或其他消费侧自行过滤。以下事件默认不作为 durable trace event 持久化:`item/reasoning/textDelta`、纯 `reasoning` item 的 `item/started|item/completed`、非用户可见工具 item 的通用 `item/started|item/completed`、`thread/tokenUsage/updated`、`account/rateLimits/updated`、普通 `warning` 和 `configWarning`。adapter 可以输出一条有界 `backend_status.phase=codex-app-server-notifications-suppressed` 摘要,只包含总数、`methods: [{ method, count }]` 和 `itemTypes: [{ itemType, count }]`,不包含 reasoning 文本、Secret、token 或 env value。method 和 item type 不得作为 JSON object key 输出,避免 `thread/tokenUsage/updated` 这类协议名被 redaction 误判为敏感 key。真实 `agentMessage`、`commandExecution`、`webSearch`、`command_output`、error、terminal 和关键生命周期事件必须继续保留。
|
||||
|
||||
`commandExecution` 的 `tool_call` event 只能输出面向人和消费侧的扁平字段,例如 `method`、`itemId`、`toolName`、`type`、`command`、`cwd`、`status`、`processId` 和 `valuesPrinted=false`。不得把 Codex app-server 的原始 `item` JSON、`itemPreview` 或嵌套协议摘要写入 `message`、`outputSummary`、`stdoutSummary` 或 payload;命令实际 stdout/stderr 只通过 `command_output` 或 completed `commandExecution` 摘要输出。
|
||||
用户可见工具生命周期的 `tool_call` event 只能输出面向人和消费侧的扁平字段,例如 `method`、`itemId`、`toolName`、`type`、`command`、`cwd`、`status`、`processId` 和 `valuesPrinted=false`。当前可见工具类型包括 `commandExecution` 和 `webSearch`;不得把 Codex app-server 的原始 `item` JSON、`itemPreview` 或嵌套协议摘要写入 `message`、`outputSummary`、`stdoutSummary` 或 payload;命令实际 stdout/stderr 只通过 `command_output` 或 completed `commandExecution` 摘要输出。
|
||||
|
||||
## Failure Mapping
|
||||
|
||||
|
||||
@@ -14,6 +14,8 @@ const defaultCodexArgs = ["app-server", "--listen", "stdio://"];
|
||||
const stderrBufferBytes = 64_000;
|
||||
const stderrEventChars = 4_000;
|
||||
const requestTimeoutCapMs = 30_000;
|
||||
const assistantDeltaProgressMinChars = 500;
|
||||
const assistantDeltaProgressLimitChars = 1_200;
|
||||
|
||||
const childEnvSummaryKeys = [
|
||||
"CODEX_HOME",
|
||||
@@ -73,12 +75,21 @@ interface CompletedAssistantMessage {
|
||||
text: string;
|
||||
}
|
||||
|
||||
interface AssistantDeltaProgressItem {
|
||||
itemId: string | null;
|
||||
text: string;
|
||||
emittedChars: number;
|
||||
flushed: boolean;
|
||||
}
|
||||
|
||||
interface SuppressedNotificationSummary {
|
||||
total: number;
|
||||
byMethod: Record<string, number>;
|
||||
byItemType: Record<string, number>;
|
||||
}
|
||||
|
||||
type AssistantDeltaProgressState = Map<string, AssistantDeltaProgressItem>;
|
||||
|
||||
interface CodexStdioCloseInfo extends JsonRecord {
|
||||
code: number | null;
|
||||
signal: string | null;
|
||||
@@ -398,6 +409,7 @@ async function runCodexStdioTurnWithSession(options: CodexStdioTurnOptions, sess
|
||||
return { terminalStatus: cancelled.status, failureKind: cancelled.failureKind, failureMessage: cancelled.message, events: events.map((event) => ({ ...event, payload: redactJson(event.payload) })) };
|
||||
}
|
||||
let assistantText = "";
|
||||
const assistantDeltaProgress = createAssistantDeltaProgressState();
|
||||
const completedAssistantMessages: CompletedAssistantMessage[] = [];
|
||||
const suppressedNotifications = createSuppressedNotificationSummary();
|
||||
let threadId: string | undefined = options.threadId;
|
||||
@@ -428,7 +440,11 @@ async function runCodexStdioTurnWithSession(options: CodexStdioTurnOptions, sess
|
||||
if (normalized.threadId) threadId = normalized.threadId;
|
||||
if (normalized.turnId) turnId = normalized.turnId;
|
||||
emitEvents(normalized.events);
|
||||
if (normalized.assistantDelta) assistantText += normalized.assistantDelta;
|
||||
if (normalized.assistantDelta) {
|
||||
assistantText += normalized.assistantDelta.text;
|
||||
const progress = recordAssistantDeltaProgress(assistantDeltaProgress, normalized.assistantDelta);
|
||||
if (progress) emitEvent(progress);
|
||||
}
|
||||
if (normalized.completedAssistantMessage) {
|
||||
completedAssistantMessages.push(normalized.completedAssistantMessage);
|
||||
emitEvent(assistantMessageEventForCompleted(normalized.completedAssistantMessage, completedAssistantMessages.length));
|
||||
@@ -504,6 +520,7 @@ async function runCodexStdioTurnWithSession(options: CodexStdioTurnOptions, sess
|
||||
}
|
||||
if (!terminal) terminal = { status: "failed", failureKind: "backend-response-invalid", message: "codex app-server finished without terminal status" };
|
||||
if (terminal.status !== "completed") emitEvents(await session.close());
|
||||
emitEvents(flushAssistantDeltaProgress(assistantDeltaProgress));
|
||||
if (completedAssistantMessages.length === 0) emitEvents(assistantMessageEventsForTurn(assistantText, terminal.status === "completed"));
|
||||
emitEvents(suppressedNotificationEvents(suppressedNotifications));
|
||||
emitEvent({ type: "terminal_status", payload: { terminalStatus: terminal.status, failureKind: terminal.failureKind, message: terminal.message } });
|
||||
@@ -567,7 +584,7 @@ function codexHomeReadiness(codexHome: string): BackendTurnResult | null {
|
||||
};
|
||||
}
|
||||
|
||||
function normalizeCodexNotification(message: JsonRecord, suppressed: SuppressedNotificationSummary): { events: BackendEvent[]; assistantDelta?: string; completedAssistantMessage?: CompletedAssistantMessage; threadId?: string; turnId?: string; terminal?: { status: TerminalStatus; failureKind: FailureKind | null; message: string | null } } {
|
||||
function normalizeCodexNotification(message: JsonRecord, suppressed: SuppressedNotificationSummary): { events: BackendEvent[]; assistantDelta?: { itemId: string | null; text: string }; completedAssistantMessage?: CompletedAssistantMessage; threadId?: string; turnId?: string; terminal?: { status: TerminalStatus; failureKind: FailureKind | null; message: string | null } } {
|
||||
const method = typeof message.method === "string" ? message.method : "unknown";
|
||||
const params = asRecordAt(message, "params");
|
||||
if (method === "thread/started") {
|
||||
@@ -582,7 +599,7 @@ function normalizeCodexNotification(message: JsonRecord, suppressed: SuppressedN
|
||||
recordSuppressedNotification(suppressed, method);
|
||||
return { events: [] };
|
||||
}
|
||||
if (method === "item/agentMessage/delta") return { events: [], assistantDelta: typeof params.delta === "string" ? params.delta : "" };
|
||||
if (method === "item/agentMessage/delta") return { events: [], assistantDelta: { itemId: stringAt(params, "itemId"), text: typeof params.delta === "string" ? params.delta : "" } };
|
||||
if (method === "item/commandExecution/outputDelta") return { events: [{ type: "command_output", payload: commandOutputPayload("stdout", typeof params.delta === "string" ? params.delta : "") }] };
|
||||
if (method === "item/reasoning/textDelta") {
|
||||
recordSuppressedNotification(suppressed, method, "reasoning");
|
||||
@@ -601,7 +618,7 @@ function normalizeCodexNotification(message: JsonRecord, suppressed: SuppressedN
|
||||
if (method === "item/started" || method === "item/completed") {
|
||||
const item = asRecordAt(params, "item");
|
||||
const itemType = typeof item.type === "string" ? item.type : "unknown";
|
||||
if (itemType !== "commandExecution" || isSuppressedCodexItemType(itemType)) {
|
||||
if (!isVisibleCodexToolItemType(itemType)) {
|
||||
recordSuppressedNotification(suppressed, method, itemType);
|
||||
return { events: [] };
|
||||
}
|
||||
@@ -664,8 +681,8 @@ function isSuppressedCodexStatusNotification(method: string): boolean {
|
||||
return method === "thread/tokenUsage/updated" || method === "account/rateLimits/updated" || method === "warning" || method === "configWarning";
|
||||
}
|
||||
|
||||
function isSuppressedCodexItemType(itemType: string): boolean {
|
||||
return itemType === "reasoning";
|
||||
function isVisibleCodexToolItemType(itemType: string): boolean {
|
||||
return itemType === "commandExecution" || itemType === "webSearch";
|
||||
}
|
||||
|
||||
function assistantMessageEventForCompleted(message: CompletedAssistantMessage, messageIndex: number): BackendEvent {
|
||||
@@ -699,6 +716,56 @@ function assistantMessageEventsForTurn(assistantDeltaText: string, completed: bo
|
||||
}];
|
||||
}
|
||||
|
||||
function createAssistantDeltaProgressState(): AssistantDeltaProgressState {
|
||||
return new Map();
|
||||
}
|
||||
|
||||
function recordAssistantDeltaProgress(state: AssistantDeltaProgressState, delta: { itemId: string | null; text: string }): BackendEvent | null {
|
||||
if (!delta.text) return null;
|
||||
const key = delta.itemId ?? "default";
|
||||
const current = state.get(key) ?? { itemId: delta.itemId, text: "", emittedChars: 0, flushed: false };
|
||||
current.text += delta.text;
|
||||
current.flushed = false;
|
||||
state.set(key, current);
|
||||
if (current.text.length - current.emittedChars < assistantDeltaProgressMinChars) return null;
|
||||
current.emittedChars = current.text.length;
|
||||
return assistantDeltaProgressEvent(current, false);
|
||||
}
|
||||
|
||||
function flushAssistantDeltaProgress(state: AssistantDeltaProgressState): BackendEvent[] {
|
||||
const events: BackendEvent[] = [];
|
||||
for (const item of state.values()) {
|
||||
if (item.flushed || item.text.trim().length === 0 || item.text.length === item.emittedChars) continue;
|
||||
item.emittedChars = item.text.length;
|
||||
item.flushed = true;
|
||||
events.push(assistantDeltaProgressEvent(item, true));
|
||||
}
|
||||
return events;
|
||||
}
|
||||
|
||||
function assistantDeltaProgressEvent(item: AssistantDeltaProgressItem, flush: boolean): BackendEvent {
|
||||
const summary = boundedTextSummary(item.text.trim(), { limitChars: assistantDeltaProgressLimitChars });
|
||||
return {
|
||||
type: "assistant_message",
|
||||
payload: {
|
||||
text: summary.text,
|
||||
itemId: item.itemId,
|
||||
source: "agent-message-delta-progress",
|
||||
messageIndex: null,
|
||||
messageCount: null,
|
||||
replyAuthority: false,
|
||||
final: false,
|
||||
progress: true,
|
||||
progressFlush: flush,
|
||||
textBytes: summary.textBytes,
|
||||
textTruncated: summary.textTruncated,
|
||||
outputBytes: summary.outputBytes,
|
||||
outputTruncated: summary.outputTruncated,
|
||||
valuesPrinted: false,
|
||||
},
|
||||
};
|
||||
}
|
||||
|
||||
function terminalStatusFromValue(value: unknown): TerminalStatus {
|
||||
if (value === "completed") return "completed";
|
||||
if (value === "cancelled" || value === "canceled" || value === "interrupted") return "cancelled";
|
||||
|
||||
@@ -79,13 +79,14 @@ const selfTest: SelfTestCase = async (context) => {
|
||||
assert.equal(finalMessageEnvelope.reply, "Final answer only.", "result reply should use the final completed agentMessage instead of concatenating progress deltas");
|
||||
const finalMessageEvents = await client.get(`/api/v1/runs/${finalMessage.runId}/events?afterSeq=0&limit=100`) as { items?: Array<{ type: string; payload: unknown }> };
|
||||
const assistantEvents = finalMessageEvents.items?.filter((event) => event.type === "assistant_message") ?? [];
|
||||
assert.equal(assistantEvents.length, 2, "backend should preserve each completed agentMessage as assistant_message event");
|
||||
assert.equal(eventPayload(assistantEvents[0] ?? { payload: {} }).text, "I am checking the workspace.");
|
||||
assert.equal(eventPayload(assistantEvents[0] ?? { payload: {} }).itemId, "msg_progress");
|
||||
assert.equal(eventPayload(assistantEvents[0] ?? { payload: {} }).replyAuthority, false);
|
||||
assert.equal(eventPayload(assistantEvents[1] ?? { payload: {} }).text, "Final answer only.");
|
||||
assert.equal(eventPayload(assistantEvents[1] ?? { payload: {} }).itemId, "msg_final");
|
||||
assert.equal(eventPayload(assistantEvents[1] ?? { payload: {} }).replyAuthority, false);
|
||||
const completedAssistantEvents = assistantEvents.filter((event) => eventPayload(event).source === "completed-agent-message");
|
||||
assert.equal(completedAssistantEvents.length, 2, "backend should preserve each completed agentMessage as assistant_message event");
|
||||
assert.equal(eventPayload(completedAssistantEvents[0] ?? { payload: {} }).text, "I am checking the workspace.");
|
||||
assert.equal(eventPayload(completedAssistantEvents[0] ?? { payload: {} }).itemId, "msg_progress");
|
||||
assert.equal(eventPayload(completedAssistantEvents[0] ?? { payload: {} }).replyAuthority, false);
|
||||
assert.equal(eventPayload(completedAssistantEvents[1] ?? { payload: {} }).text, "Final answer only.");
|
||||
assert.equal(eventPayload(completedAssistantEvents[1] ?? { payload: {} }).itemId, "msg_final");
|
||||
assert.equal(eventPayload(completedAssistantEvents[1] ?? { payload: {} }).replyAuthority, false);
|
||||
const finalMessageItems = finalMessageEvents.items ?? [];
|
||||
const progressMessageIndex = finalMessageItems.findIndex((event) => event.type === "assistant_message" && eventPayload(event).itemId === "msg_progress");
|
||||
const finalMessageIndex = finalMessageItems.findIndex((event) => event.type === "assistant_message" && eventPayload(event).itemId === "msg_final");
|
||||
@@ -94,6 +95,28 @@ const selfTest: SelfTestCase = async (context) => {
|
||||
assert.ok(finalMessageIndex >= 0 && finalMessageIndex < turnCompletedIndex, "final agentMessage should be emitted before turn/completed instead of being delayed to final response");
|
||||
assert.equal(finalMessageItems.some((event) => event.type === "backend_status" && String(eventPayload(event).phase ?? "").startsWith("item/agentMessage:")), false, "agentMessage lifecycle must not be persisted as backend_status noise");
|
||||
|
||||
const webSearch = await createRunWithCommand(client, context, "hello web search progress", "selftest-web-search-progress", 15_000);
|
||||
const webSearchPromise = runOnce({ managerUrl: server.baseUrl, runId: webSearch.runId, codexCommand: context.fakeCodexCommand, codexArgs: context.fakeCodexArgs, codexHome: context.codexHome, env: { CODEX_HOME: context.codexHome, AGENTRUN_FAKE_CODEX_MODE: "web-search-progress" }, oneShot: true }) as Promise<JsonRecord>;
|
||||
await waitForEvent(client, webSearch.runId, (event) => event.type === "tool_call" && eventPayload(event).type === "webSearch" && eventPayload(event).method === "item/started", "webSearch tool_call start event");
|
||||
await waitForEvent(client, webSearch.runId, (event) => event.type === "assistant_message" && eventPayload(event).source === "agent-message-delta-progress", "assistant delta progress event");
|
||||
const webSearchResult = await webSearchPromise;
|
||||
assert.equal(webSearchResult.terminalStatus, "completed", "web search progress turn should complete");
|
||||
const webSearchEnvelope = await client.get(`/api/v1/runs/${webSearch.runId}/commands/${webSearch.commandId}/result`) as JsonRecord;
|
||||
assert.equal(webSearchEnvelope.reply, "Final IAM recommendation.", "result reply should ignore live delta progress snapshots");
|
||||
const webSearchEvents = await client.get(`/api/v1/runs/${webSearch.runId}/events?afterSeq=0&limit=100`) as { items?: Array<{ type: string; payload: unknown }> };
|
||||
const webSearchItems = webSearchEvents.items ?? [];
|
||||
assert.ok(webSearchItems.some((event) => event.type === "tool_call" && eventPayload(event).type === "webSearch" && eventPayload(event).method === "item/completed"), "webSearch completion must remain visible as a tool_call");
|
||||
assert.ok(webSearchItems.some((event) => event.type === "assistant_message" && eventPayload(event).source === "agent-message-delta-progress" && eventPayload(event).progress === true), "assistant delta progress must be visible before final reply");
|
||||
const webSearchStartIndex = webSearchItems.findIndex((event) => event.type === "tool_call" && eventPayload(event).type === "webSearch" && eventPayload(event).method === "item/started");
|
||||
const webSearchProgressIndex = webSearchItems.findIndex((event) => event.type === "assistant_message" && eventPayload(event).source === "agent-message-delta-progress");
|
||||
const webSearchCompletedIndex = webSearchItems.findIndex((event) => event.type === "tool_call" && eventPayload(event).type === "webSearch" && eventPayload(event).method === "item/completed");
|
||||
const webSearchFinalIndex = webSearchItems.findIndex((event) => event.type === "assistant_message" && eventPayload(event).source === "completed-agent-message" && eventPayload(event).itemId === "msg_search");
|
||||
assert.ok(webSearchStartIndex >= 0 && webSearchStartIndex < webSearchProgressIndex, "webSearch start should be visible before assistant progress");
|
||||
assert.ok(webSearchProgressIndex >= 0 && webSearchProgressIndex < webSearchCompletedIndex, "assistant progress should be visible while webSearch is still running");
|
||||
assert.ok(webSearchCompletedIndex >= 0 && webSearchCompletedIndex < webSearchFinalIndex, "webSearch completion should be visible before final assistant reply");
|
||||
assert.equal(webSearchItems.some((event) => event.type === "tool_call" && eventPayload(event).type === "reasoning"), false, "reasoning items must still not be persisted as tool_call");
|
||||
assertNoSecretLeak(webSearchEvents);
|
||||
|
||||
const staleThread = await createStaleThreadRun(client, context);
|
||||
const staleThreadResult = await runOnce({
|
||||
managerUrl: server.baseUrl,
|
||||
@@ -148,7 +171,7 @@ const selfTest: SelfTestCase = async (context) => {
|
||||
assert.equal(noisyItems.some((event) => event.type === "backend_status" && eventPayload(event).phase === "configWarning"), false, "low value config warnings must not be persisted as backend_status");
|
||||
assert.equal(noisyItems.some((event) => event.type === "tool_call" && eventPayload(event).type === "reasoning"), false, "reasoning items must not be persisted as tool_call");
|
||||
assert.ok(noisyItems.some((event) => event.type === "tool_call" && eventPayload(event).method === "item/started" && eventPayload(event).type === "commandExecution"), "real commandExecution tool call should remain visible");
|
||||
assert.equal(noisyItems.some((event) => event.type === "tool_call" && eventPayload(event).type !== "commandExecution"), false, "non-commandExecution item lifecycle must not be persisted as tool_call");
|
||||
assert.equal(noisyItems.some((event) => event.type === "tool_call" && eventPayload(event).type !== "commandExecution" && eventPayload(event).type !== "webSearch"), false, "only user-visible tool lifecycle items should be persisted as tool_call");
|
||||
assert.equal(noisyItems.some((event) => event.type === "backend_status" && String(eventPayload(event).phase ?? "").startsWith("item/agentMessage:")), false, "agentMessage lifecycle must not be persisted as backend_status noise");
|
||||
assert.equal(noisyPhases.includes("backend-turn-running"), false, "backend progress ticks must be summarized instead of persisted as durable trace events");
|
||||
const noisyFinished = noisyItems.find((event) => event.type === "backend_status" && eventPayload(event).phase === "backend-turn-finished");
|
||||
@@ -186,7 +209,7 @@ const selfTest: SelfTestCase = async (context) => {
|
||||
await runSecretFailureCase({ client, managerUrl: server.baseUrl, context });
|
||||
await runSpawnFailureCase({ client, managerUrl: server.baseUrl, context });
|
||||
|
||||
return { name: "codex-stdio", tests: ["runner-lease-heartbeat", "codex-stdio-fake-turn", "codex-stdio-projected-writable-home", "codex-stdio-deepseek-profile-fake-turn", "codex-stdio-minimax-m3-profile-fake-turn", "codex-stdio-deepseek-missing-secret-no-fallback", "codex-stdio-minimax-m3-missing-secret-no-fallback", "codex-stdio-config-model-authoritative", "codex-stdio-explicit-model-forwarded", "codex-stdio-final-agent-message-only", "codex-stdio-stale-thread-resume-failed", "codex-stdio-live-tool-events", "codex-stdio-noisy-reasoning-suppression", "codex-stdio-missing-turn-result", "codex-stdio-provider-auth-failed", "codex-stdio-provider-rate-limited", "codex-stdio-provider-invalid-tool-call", "codex-stdio-provider-503-rpc-error", "codex-stdio-provider-503-terminal", "codex-stdio-provider-503-retry-event", "codex-stdio-invalid-json", "codex-stdio-timeout", "codex-stdio-secret-unavailable", "codex-stdio-spawn-failure"] };
|
||||
return { name: "codex-stdio", tests: ["runner-lease-heartbeat", "codex-stdio-fake-turn", "codex-stdio-projected-writable-home", "codex-stdio-deepseek-profile-fake-turn", "codex-stdio-minimax-m3-profile-fake-turn", "codex-stdio-deepseek-missing-secret-no-fallback", "codex-stdio-minimax-m3-missing-secret-no-fallback", "codex-stdio-config-model-authoritative", "codex-stdio-explicit-model-forwarded", "codex-stdio-final-agent-message-only", "codex-stdio-web-search-progress", "codex-stdio-stale-thread-resume-failed", "codex-stdio-live-tool-events", "codex-stdio-noisy-reasoning-suppression", "codex-stdio-missing-turn-result", "codex-stdio-provider-auth-failed", "codex-stdio-provider-rate-limited", "codex-stdio-provider-invalid-tool-call", "codex-stdio-provider-503-rpc-error", "codex-stdio-provider-503-terminal", "codex-stdio-provider-503-retry-event", "codex-stdio-invalid-json", "codex-stdio-timeout", "codex-stdio-secret-unavailable", "codex-stdio-spawn-failure"] };
|
||||
} finally {
|
||||
await new Promise<void>((resolve) => server.server.close(() => resolve()));
|
||||
}
|
||||
@@ -234,11 +257,6 @@ function countEntriesByName(value: unknown, keyName: "method" | "itemType"): Rec
|
||||
return output;
|
||||
}
|
||||
|
||||
function eventPayloadItem(event: { payload: unknown }): JsonRecord {
|
||||
const item = eventPayload(event).item;
|
||||
return typeof item === "object" && item !== null && !Array.isArray(item) ? item as JsonRecord : {};
|
||||
}
|
||||
|
||||
async function waitForEvent(client: ManagerClient, runId: string, predicate: (event: { type: string; payload: unknown }) => boolean, label: string): Promise<void> {
|
||||
const deadline = Date.now() + 3_000;
|
||||
while (Date.now() < deadline) {
|
||||
|
||||
@@ -151,6 +151,22 @@ for await (const line of rl) {
|
||||
respond(message.id, { turn });
|
||||
continue;
|
||||
}
|
||||
if (mode === "web-search-progress") {
|
||||
turnCounter += 1;
|
||||
const turn = { id: `turn_selftest_${turnCounter}`, status: "completed" };
|
||||
notify("turn/started", { turn });
|
||||
notify("item/started", { item: { id: "search_selftest", type: "webSearch", status: "running" } });
|
||||
notify("item/agentMessage/delta", { itemId: "msg_search", delta: "I am checking Kubernetes identity components and deployment docs. " });
|
||||
notify("item/agentMessage/delta", { itemId: "msg_search", delta: "Keycloak, ZITADEL, authentik, Ory, Dex, OpenFGA, and SpiceDB are being compared for lifecycle and authorization coverage. " });
|
||||
notify("item/agentMessage/delta", { itemId: "msg_search", delta: "Gateway/IAP choices are being separated from IdP and fine-grained authorization so the result can recommend a layered architecture. " });
|
||||
notify("item/agentMessage/delta", { itemId: "msg_search", delta: "This long progress text intentionally crosses the AgentRun live progress threshold before the final completed agentMessage is emitted. " });
|
||||
notify("item/agentMessage/delta", { itemId: "msg_search", delta: "The visible trace should therefore show work in progress while web search is still running, not only after turn completion. " });
|
||||
notify("item/completed", { item: { id: "search_selftest", type: "webSearch", status: "completed", outputSummary: "searched Kubernetes IAM and gateway auth options" } });
|
||||
notify("item/completed", { item: { id: "msg_search", type: "agentMessage", text: "Final IAM recommendation." } });
|
||||
notify("turn/completed", { turn });
|
||||
respond(message.id, { turn });
|
||||
continue;
|
||||
}
|
||||
if (mode === "slow-tool-events") {
|
||||
turnCounter += 1;
|
||||
const turn = { id: `turn_selftest_${turnCounter}`, status: "completed" };
|
||||
|
||||
Reference in New Issue
Block a user