From c09ea32b50b3eb417efd446bee90fb1e0956baea Mon Sep 17 00:00:00 2001 From: Codex Date: Tue, 9 Jun 2026 13:00:06 +0000 Subject: [PATCH] docs: record sub2api model-routing cooldown policy --- .agents/skills/unidesk-sub2api/SKILL.md | 5 +++-- docs/reference/platform-infra.md | 5 +++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/.agents/skills/unidesk-sub2api/SKILL.md b/.agents/skills/unidesk-sub2api/SKILL.md index 77b3722d..583e8183 100644 --- a/.agents/skills/unidesk-sub2api/SKILL.md +++ b/.agents/skills/unidesk-sub2api/SKILL.md @@ -57,7 +57,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool validate - `pool.apiKeySecretName` / `pool.apiKeySecretKey`: 统一消费 API key 的 k3s Secret 位置,默认 `platform-infra/sub2api-codex-pool-api-key.API_KEY`。 - `pool.minOwnerBalanceUsd`: pool key owner 最低余额,sync/validate 会补齐。 - `pool.minOwnerConcurrency`: 统一消费 API key owner 的最低并发,sync/validate 会补齐;用于避免共享 key 在用户并发层触发 WS 1013,不要用提高某个 provider capacity 来掩盖。 -- `pool.defaultTempUnschedulable`: 默认账号级临时下线规则;用于在上游返回容量、限流、overload 或认证状态异常时,让 Sub2API 冷却该账号并切换到同组其他账号。 +- `pool.defaultTempUnschedulable`: 默认账号级临时下线规则;用于在上游返回容量、限流、overload、service unavailable、稳定模型路由错误或认证状态异常时,让 Sub2API 冷却该账号并切换到同组其他账号。 - `profiles.entries`: 从 master `~/.codex/` 选择上游 profile 并映射到 Sub2API account。 - `profiles.entries[].capacity`: 可选 per-account concurrency override;不写则使用 `pool.defaultAccountCapacity`。具体数值只以 `config/platform-infra/sub2api-codex-pool.yaml` 为准,skill 和长期参考只描述规则,不重复写当前值。 - `profiles.entries[].loadFactor`: 可选 per-account Sub2API `load_factor` override;不写则使用 `pool.defaultAccountLoadFactor`。具体数值只以 `config/platform-infra/sub2api-codex-pool.yaml` 为准,修改后必须 `codex-pool sync --confirm` 和 `codex-pool validate`。 @@ -146,7 +146,8 @@ bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm - 上游要求 Codex User-Agent:只给该 profile 配 `upstreamUserAgent`,跑 `sync --confirm`。 - 上游报 capacity/rate-limit/overload/Bad Gateway 后没有切号或频繁先失败再恢复:先确认 `codex-pool validate` 里 `tempUnschedulable.ok=true` 且目标 account `runtimeEnabled=true`、规则数符合 YAML;再看 `validation.gatewayResponses.evidence.failovers` 的 account/upstream status。若 mismatch,跑 `codex-pool sync --confirm`,不要手工 patch Sub2API credentials。 - Codex 报 weekly-limit、`less than 10% of your weekly limit left`、`Run /status for a breakdown` 等账号状态/软配额提示并要求切号:把稳定 body 关键词放进 `pool.defaultTempUnschedulable` 的 403 和 429 规则,跑 `codex-pool sync --confirm`,再用 `codex-pool validate` 确认每个 managed account 的 runtime 403/429 rules 都包含这些关键词。Sub2API 临时下线规则按 HTTP status + body keyword 匹配;如果该文案是 HTTP 200 成功内容,需要另提响应分类能力 issue,不能只靠 YAML 冷却规则声明解决。 -- 上游错误反复触发:默认错误冷却按严重程度分层;临时问题可从 10 分钟起步,网关/服务不可用/过载类应更长,认证/权限/配额/账号状态类使用最长冷却。`Recovered upstream error ...`、`Bad Gateway` 和 Codex-facing `Upstream request failed` 这类通用包装文案都应留在 YAML 冷却政策里。具体数值只以 YAML 为准,修改后必须 `codex-pool sync --confirm` 和 `codex-pool validate`。 +- 上游 503 响应体出现 `model_not_found`、`No available channel for model ...` 或同类稳定模型路由失败文案:把稳定 body 关键词放进 `pool.defaultTempUnschedulable` 的 503 规则,跑 `codex-pool sync --confirm`,再用 `codex-pool validate` 确认目标 account 的 runtime 503 rule 包含这些关键词;不要用 account membership、priority、capacity、loadFactor、WebSocket mode 或 User-Agent 改动掩盖该错误族。 +- 上游错误反复触发:默认错误冷却按严重程度分层;临时问题可从 10 分钟起步,网关/服务不可用/过载/模型路由类应更长,认证/权限/配额/账号状态类使用最长冷却。`Recovered upstream error ...`、`Bad Gateway`、Codex-facing `Upstream request failed`、`model_not_found` 和 `No available channel for model` 这类稳定包装文案都应留在 YAML 冷却政策里。具体数值只以 YAML 为准,修改后必须 `codex-pool sync --confirm` 和 `codex-pool validate`。长期判定见 `docs/reference/platform-infra.md`。 - Codex auto compact 后丢上下文:先确认本机 `~/.codex/config.toml` 是否有 `supports_websockets = true` 和 `responses_websockets_v2 = true`,再看 `codex-pool validate` 的 WSv2 candidate 和 Sub2API 日志里的 `transport=responses_websockets_v2`。 - Codex smoke 有 reconnect/1013:这是上游并发/可用性问题,和 HTTP-only compact context-loss 分开处理;记录 session/log 证据并关联专项 issue,不要用运行时手补覆盖 YAML 容量。 diff --git a/docs/reference/platform-infra.md b/docs/reference/platform-infra.md index eaacbfe2..2a6563ce 100644 --- a/docs/reference/platform-infra.md +++ b/docs/reference/platform-infra.md @@ -26,13 +26,14 @@ - `pool.groupName` names the Sub2API group that represents the pool. - `pool.apiKeySecretName` and `pool.apiKeySecretKey` name the k3s Secret that stores the single consumer API key. - `pool.minOwnerConcurrency` declares the minimum concurrency for the Sub2API user that owns the unified consumer API key. Keep it high enough to cover the declared account capacity set, so the shared key does not fail WS sessions at the user-concurrency layer. Do not compensate for owner-concurrency 1013 errors by pinning capacity to one provider. -- `pool.defaultTempUnschedulable` declares Sub2API account-level temporary unschedulable rules. Keep 429/overload/capacity failures in this YAML policy so the scheduler can cool down a failing account and choose another candidate instead of hard-pinning one provider. +- `pool.defaultTempUnschedulable` declares Sub2API account-level temporary unschedulable rules. Keep 429/overload/capacity, service-unavailable, and stable model-routing failures in this YAML policy so the scheduler can cool down a failing account and choose another candidate instead of hard-pinning one provider. - `profiles.entries` selects local Codex profile files from `~/.codex/` and maps them to Sub2API account names. - `profiles.entries[].capacity` optionally overrides `pool.defaultAccountCapacity` for one account. Capacity is a YAML-controlled routing input; concrete current values belong only in `config/platform-infra/sub2api-codex-pool.yaml` and runtime validation output, not in long-term reference prose. Code constants, Secrets, ad-hoc runtime patches, or stale tests must not override YAML source of truth. - `profiles.entries[].loadFactor` optionally overrides `pool.defaultAccountLoadFactor` for one account and is rendered to Sub2API `load_factor`. Treat it as routing policy: values belong in YAML and `codex-pool validate` output, not code constants, Secrets, or ad-hoc runtime patches. - Do not change account membership, priority, capacity, load factor, WebSocket mode, or other routing policy from inference alone. Unless the user explicitly asks for a configuration change, first preserve the current YAML, collect provenance and runtime evidence, and write the finding to the relevant issue or runbook before proposing a change. - `profiles.entries[].tempUnschedulable` may override the pool default for one account. The CLI renders it into Sub2API credentials as `temp_unschedulable_enabled` and `temp_unschedulable_rules`; rules match HTTP status plus response-body keywords and place only that account into a temporary unschedulable cooldown. - Codex account-state or quota prompts that stop a task and ask the operator to switch accounts belong in `pool.defaultTempUnschedulable`, not in account membership, priority, capacity, load factor, WebSocket mode, or `pool_mode`. Keep stable body phrases such as weekly-limit and `/status` prompts in both the 403 account-state rule and the 429 quota/rate-limit rule, then run `codex-pool sync --confirm` and `codex-pool validate`. The validation evidence must include runtime temporary-unschedulable alignment for each managed account, not only successful group-level `/v1/models` or `/v1/responses` smoke output. +- Upstream model-routing failures that surface as 503 responses, such as `model_not_found` or "no available channel for model" wrappers, also belong in `pool.defaultTempUnschedulable`. They are not membership, priority, capacity, load factor, WebSocket mode, or User-Agent decisions by themselves. After adding stable body phrases, run `codex-pool sync --confirm` and `codex-pool validate`, and verify the affected account's runtime 503 rule includes the new keywords. - `profiles.entries[].openaiResponsesWebSocketsV2Mode` is the account-level Responses WebSocket v2 switch for OpenAI-compatible upstreams that require WebSocket transport. Allowed values are `off`, `ctx_pool`, and `passthrough`; omit the field unless that upstream needs it. - `profiles.entries[].upstreamUserAgent` is an optional account-level upstream request User-Agent override. Use it only for upstreams that require a Codex CLI compatible User-Agent; keep the value YAML-controlled and newline-free. - `publicExposure` controls the optional FRP bridge from master server to the G14 ClusterIP service. @@ -44,7 +45,7 @@ When Codex startup repeatedly reports WebSocket reconnects or HTTPS fallback, pr Do not encode current availability assumptions in long-term reference prose. If an account needs a higher concurrency or load factor than the pool default, make that a deliberate YAML override and verify it with `codex-pool validate`; the reference document should describe the rule, not repeat the current numeric value. -Do not enable Sub2API `pool_mode` for UniDesk-managed Codex accounts. `pool_mode` retries the same selected account path, while UniDesk's desired failover behavior is to mark the failing account temporarily unschedulable and let Sub2API choose another account from the group. `codex-pool validate` reports each managed account's temporary-unschedulable runtime alignment and should be used after `codex-pool sync --confirm`. Generic 502 bodies such as `Recovered upstream error 502`, `Bad Gateway`, and Codex-facing `Upstream request failed` must stay in the YAML cooldown policy so an intermittently bad account is cooled down instead of repeatedly adding latency at the next compact or Responses request. The Codex pool default error cooldown is severity-tiered: temporary signals can start at ten minutes, gateway/service/overload failures should cool down longer, and credential, permission, quota, or account-state failures should use the longest cooldown. Exact current values belong in YAML and runtime validation output. +Do not enable Sub2API `pool_mode` for UniDesk-managed Codex accounts. `pool_mode` retries the same selected account path, while UniDesk's desired failover behavior is to mark the failing account temporarily unschedulable and let Sub2API choose another account from the group. `codex-pool validate` reports each managed account's temporary-unschedulable runtime alignment and should be used after `codex-pool sync --confirm`. Generic 502/503 bodies such as `Recovered upstream error 502`, `Bad Gateway`, Codex-facing `Upstream request failed`, and stable `model_not_found` / "no available channel for model" wrappers must stay in the YAML cooldown policy so an intermittently bad account is cooled down instead of repeatedly adding latency at the next compact or Responses request. The Codex pool default error cooldown is severity-tiered: temporary signals can start at ten minutes, gateway/service/overload/model-routing failures should cool down longer, and credential, permission, quota, or account-state failures should use the longest cooldown. Exact current values belong in YAML and runtime validation output. Sub2API temporary-unschedulable rules require both an HTTP status match and a response-body keyword match. Do not treat them as a general successful-response content filter. If an upstream returns a quota warning as normal HTTP 200 assistant content, track that as a separate response-classification capability issue instead of claiming the YAML cooldown policy has covered it.