docs: track sub2api 2xx classification gap

This commit is contained in:
Codex
2026-06-10 07:54:19 +00:00
parent 8735b4103c
commit b78bb2c306
5 changed files with 18 additions and 2 deletions
+1 -1
View File
@@ -146,7 +146,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm
- Codex 启动 WebSocket 回退:用原入口 Codex smoke 复现,再用 bounded Sub2API 日志确认 account;对 WS handshake 4xx/5xx、`openai.websocket_account_select_failed` 或 close-before-`response.completed` 的账号关闭 YAML WSv2 能力后同步。若没有剩余 WSv2-capable account,把 `localCodex.supportsWebSockets``localCodex.responsesWebSocketsV2` 一起关掉,不把临时可用性推断写成调度配置。
- 上游要求 Codex User-Agent:只给该 profile 配 `upstreamUserAgent`,跑 `sync --confirm`
- 上游报 capacity/rate-limit/overload/Bad Gateway/Gateway Timeout 后没有切号或频繁先失败再恢复:先确认 `codex-pool validate``tempUnschedulable.ok=true` 且目标 account `runtimeEnabled=true`、规则数符合 YAML;再看 `validation.gatewayResponses.evidence.failovers` 的 account/upstream status。若 mismatch,跑 `codex-pool sync --confirm`,不要手工 patch Sub2API credentials。
- Codex 报 weekly-limit、`less than 10% of your weekly limit left``Run /status for a breakdown` 等账号状态/软配额提示并要求切号:把稳定 body 关键词放进 `pool.defaultTempUnschedulable` 的 403 和 429 规则,跑 `codex-pool sync --confirm`,再用 `codex-pool validate` 确认每个 managed account 的 runtime 403/429 rules 都包含这些关键词。Sub2API 临时下线规则按 HTTP status + body keyword 匹配;如果该文案是 HTTP 200 成功内容,需要另提响应分类能力 issue,不能只靠 YAML 冷却规则声明解决
- Codex 报 weekly-limit、`less than 10% of your weekly limit left``Run /status for a breakdown` 等账号状态/软配额提示并要求切号:把稳定 body 关键词放进 `pool.defaultTempUnschedulable` 的 403 和 429 规则,跑 `codex-pool sync --confirm`,再用 `codex-pool validate` 确认每个 managed account 的 runtime 403/429 rules 都包含这些关键词。Sub2API 临时下线规则按 HTTP status + body keyword 匹配;如果该文案是 HTTP 200 成功内容,最多只能把 200 规则作为期望分类信号写进 YAML,同时必须登记响应分类能力 issue,不能只靠 YAML 规则声明当作已能冷却账号
- 上游 503 响应体出现 `model_not_found``No available channel for model ...` 或同类稳定模型路由失败文案:把稳定 body 关键词放进 `pool.defaultTempUnschedulable` 的 503 规则,跑 `codex-pool sync --confirm`,再用 `codex-pool validate` 确认目标 account 的 runtime 503 rule 包含这些关键词;不要用 account membership、priority、capacity、loadFactor、WebSocket mode 或 User-Agent 改动掩盖该错误族。
- 上游错误反复触发:默认错误冷却按严重程度分层;临时问题可从 10 分钟起步,网关/服务不可用/过载/模型路由类应更长,认证/权限/配额/账号状态类使用最长冷却。`Recovered upstream error ...``Bad Gateway``Gateway Timeout`、Cloudflare `524`、Codex-facing `Upstream request failed``Unknown error``context deadline exceeded``context canceled``model_not_found``No available channel for model`、大上下文 `413``openai_error` 这类稳定包装文案都应留在对应 5xx/413 YAML 冷却政策里,特别是 compact 链路里上游 524 可能最终表现为客户端 502/504 + `Unknown error`。具体数值只以 YAML 为准,修改后必须 `codex-pool sync --confirm``codex-pool validate`。长期判定见 `docs/reference/platform-infra.md`
- Codex auto compact 后丢上下文:先确认 YAML `localCodex` 是否声明启用 WSv2;若启用,再确认本机 `~/.codex/config.toml` 是否有 `supports_websockets = true``responses_websockets_v2 = true`,并看 `codex-pool validate` 的 WSv2 candidate 和 Sub2API 日志里的 `transport=responses_websockets_v2`。若 YAML 当前禁用 WSv2,则按 HTTP Responses 稳定性排查,不把旧 WS 口径当成验收要求。
@@ -11,6 +11,10 @@ pool:
defaultTempUnschedulable:
enabled: true
rules:
- statusCode: 200
keywords: [less than 10% of your weekly limit left]
durationMinutes: 120
description: Success-body account-state prompts require Sub2API 2xx body reclassification before they can cool accounts.
- statusCode: 401
keywords: [unauthorized, invalid api key, invalid_api_key, authentication, recovered upstream error]
durationMinutes: 120
+1 -1
View File
@@ -49,7 +49,7 @@ Do not encode current availability assumptions in long-term reference prose. If
Do not enable Sub2API `pool_mode` for UniDesk-managed Codex accounts. `pool_mode` retries the same selected account path, while UniDesk's desired failover behavior is to mark the failing account temporarily unschedulable and let Sub2API choose another account from the group. `codex-pool validate` reports each managed account's temporary-unschedulable runtime alignment and should be used after `codex-pool sync --confirm`. Generic 502/503/504 bodies such as `Recovered upstream error 502`, `Bad Gateway`, `Gateway Timeout`, Codex-facing `Upstream request failed`, `Unknown error`, context-deadline/canceled wrappers, and stable `model_not_found` / "no available channel for model" wrappers must stay in the YAML cooldown policy so an intermittently bad account is cooled down instead of repeatedly adding latency at the next compact or Responses request. The Codex pool default error cooldown is severity-tiered: temporary signals can start at ten minutes, gateway/service/overload/model-routing failures should cool down longer, and credential, permission, quota, or account-state failures should use the longest cooldown. Exact current values belong in YAML and runtime validation output.
Sub2API temporary-unschedulable rules require both an HTTP status match and a response-body keyword match. Do not treat them as a general successful-response content filter. If an upstream returns a quota warning as normal HTTP 200 assistant content, track that as a separate response-classification capability issue instead of claiming the YAML cooldown policy has covered it.
Sub2API temporary-unschedulable rules require both an HTTP status match and a response-body keyword match. Do not treat them as a general successful-response content filter. If an upstream returns a quota warning as normal HTTP 200 assistant content, keep the same stable phrases in the 403/429 rules and track a separate response-classification capability issue; a YAML 200 rule may document the desired classification signal, but validation of that rule only proves it is stored, not that successful assistant content currently cools the selected account.
The request path is:
@@ -79,14 +79,20 @@ if (parsed.pool?.defaultTempUnschedulable?.enabled === true) {
}
const accountState403Rule = rules.find((rule) => rule.statusCode === 403);
const quota429Rule = rules.find((rule) => rule.statusCode === 429);
const successBody200Rule = rules.find((rule) => rule.statusCode === 200);
const serviceUnavailable503Rule = rules.find((rule) => rule.statusCode === 503);
const accountState403Keywords = new Set((accountState403Rule?.keywords ?? []).map((keyword) => keyword.toLowerCase()));
const quota429Keywords = new Set((quota429Rule?.keywords ?? []).map((keyword) => keyword.toLowerCase()));
const successBody200Keywords = new Set((successBody200Rule?.keywords ?? []).map((keyword) => keyword.toLowerCase()));
const serviceUnavailable503Keywords = new Set((serviceUnavailable503Rule?.keywords ?? []).map((keyword) => keyword.toLowerCase()));
for (const keyword of ["weekly limit", "less than 10% of your weekly limit left", "run /status for a breakdown"]) {
assertCondition(accountState403Keywords.has(keyword), "403 temporary-unschedulable rule must catch Codex weekly-limit account-state prompts", { keyword, accountState403Rule });
assertCondition(quota429Keywords.has(keyword), "429 temporary-unschedulable rule must catch Codex weekly-limit quota prompts", { keyword, quota429Rule });
}
if (successBody200Rule !== undefined) {
assertCondition(successBody200Keywords.has("less than 10% of your weekly limit left"), "200 temporary-unschedulable rule must document the weekly-limit success-body classifier phrase", successBody200Rule);
assertCondition(/reclassification/u.test(successBody200Rule.description ?? ""), "200 temporary-unschedulable rule must be documented as a reclassification signal, not a proven cooldown", successBody200Rule);
}
for (const keyword of ["model_not_found", "no available channel for model"]) {
assertCondition(serviceUnavailable503Keywords.has(keyword), "503 temporary-unschedulable rule must catch upstream model-routing failures", { keyword, serviceUnavailable503Rule });
}
@@ -668,6 +668,12 @@ export function defaultCodexTempUnschedulablePolicy(): CodexTempUnschedulablePol
return {
enabled: true,
rules: [
{
statusCode: 200,
keywords: ["less than 10% of your weekly limit left"],
durationMinutes: 120,
description: "Success-body account-state prompts require Sub2API 2xx body reclassification before they can cool accounts.",
},
{
statusCode: 401,
keywords: ["unauthorized", "invalid api key", "invalid_api_key", "authentication", "recovered upstream error"],