fix: add compact timeout cooldown evidence

This commit is contained in:
Codex
2026-06-10 07:42:21 +00:00
parent 9710a1a6f6
commit 8735b4103c
6 changed files with 85 additions and 14 deletions
+1 -1
View File
@@ -35,7 +35,7 @@
- Do not change account membership, priority, capacity, load factor, WebSocket mode, or other routing policy from inference alone. Unless the user explicitly asks for a configuration change, first preserve the current YAML, collect provenance and runtime evidence, and write the finding to the relevant issue or runbook before proposing a change.
- `profiles.entries[].tempUnschedulable` may override the pool default for one account. The CLI renders it into Sub2API credentials as `temp_unschedulable_enabled` and `temp_unschedulable_rules`; rules match HTTP status plus response-body keywords and place only that account into a temporary unschedulable cooldown.
- Codex account-state or quota prompts that stop a task and ask the operator to switch accounts belong in `pool.defaultTempUnschedulable`, not in account membership, priority, capacity, load factor, WebSocket mode, or `pool_mode`. Keep stable body phrases such as weekly-limit and `/status` prompts in both the 403 account-state rule and the 429 quota/rate-limit rule, then run `codex-pool sync --confirm` and `codex-pool validate`. The validation evidence must include runtime temporary-unschedulable alignment for each managed account, not only successful group-level `/v1/models` or `/v1/responses` smoke output.
- Upstream model-routing failures that surface as 503 responses, such as `model_not_found` or "no available channel for model" wrappers, also belong in `pool.defaultTempUnschedulable`. Gateway timeout failures that surface as 504 responses, including `Gateway Timeout`, `Unknown error`, `Upstream request failed`, `context deadline exceeded`, `context canceled`, or recovered upstream-error wrappers, belong in the same YAML policy. They are not membership, priority, capacity, load factor, WebSocket mode, or User-Agent decisions by themselves. After adding stable body phrases, run `codex-pool sync --confirm` and `codex-pool validate`, and verify the affected account's runtime status-specific rule includes the new keywords.
- Upstream model-routing failures that surface as 503 responses, such as `model_not_found` or "no available channel for model" wrappers, also belong in `pool.defaultTempUnschedulable`. Gateway and timeout failures that surface as 502, 504, or 524 responses, including `Gateway Timeout`, `Unknown error`, `Upstream request failed`, `context deadline exceeded`, `context canceled`, or recovered upstream-error wrappers, belong in the same YAML policy. This is especially important for compact requests, where an upstream Cloudflare 524 may eventually reach Codex as a 502/504 unknown-error wrapper after failover or client cancellation. They are not membership, priority, capacity, load factor, WebSocket mode, or User-Agent decisions by themselves. After adding stable body phrases, run `codex-pool sync --confirm` and `codex-pool validate`, and verify the affected account's runtime status-specific rule includes the new keywords.
- `profiles.entries[].openaiResponsesWebSocketsV2Mode` is the account-level Responses WebSocket v2 switch for OpenAI-compatible upstreams that require WebSocket transport. Allowed values are `off`, `ctx_pool`, and `passthrough`; omit the field unless that upstream needs it.
- `profiles.entries[].upstreamUserAgent` is an optional account-level upstream request User-Agent override. Use it only for upstreams that require a Codex CLI compatible User-Agent; keep the value YAML-controlled and newline-free.
- `publicExposure` controls the optional FRP bridge from master server to the G14 ClusterIP service.