docs: record Sub2API D601 proxy runbook (#844)
Co-authored-by: Codex <codex@noreply.local>
This commit is contained in:
@@ -59,6 +59,22 @@ bun scripts/cli.ts platform-infra sub2api validate --target G14
|
||||
- `status --full|--raw` 只在需要展开远端 stdout/stderr 或原始 JSON 时使用。
|
||||
- `validate` 是按需验收,不是连续可用性探针。对 standby target,`validate --target <id>` 验证预部署形态,不要求外置 DB 当前可连接;对 external-active target,必须验证外置 DB、ephemeral Redis、Sub2API service、YAML egress proxy 和目标级 public exposure。
|
||||
|
||||
## D601 Egress Proxy
|
||||
|
||||
D601 的目标级 `egressProxy` 完全由 `config/platform-infra/sub2api.yaml` 控制。当前成熟形态是 master Docker `shadowsocks-rust` 作为加密出站源,D601 k3s 内 `sing-box` 暴露 HTTP/mixed ClusterIP proxy 给 Sub2API 和按 YAML 启用的 sentinel 使用。不要把 endpoint、端口、密码、健康探针或镜像 tag 写进 skill;只以 YAML 和 `config/platform-infra/sub2api-master-egress-proxy.compose.yaml` 为准。
|
||||
|
||||
master 侧 proxy 由 UniDesk checkout 内的 compose 文件管理:
|
||||
|
||||
```bash
|
||||
docker compose -f config/platform-infra/sub2api-master-egress-proxy.compose.yaml up -d --force-recreate
|
||||
bun scripts/cli.ts platform-infra sub2api apply --target D601 --confirm
|
||||
bun scripts/cli.ts platform-infra sub2api validate --target D601
|
||||
bun scripts/cli.ts platform-infra sub2api codex-pool sync --target D601 --confirm
|
||||
bun scripts/cli.ts platform-infra sub2api codex-pool validate --target D601
|
||||
```
|
||||
|
||||
proxy secret/config 文件只允许放在受控 Secret/state 路径,输出只能披露路径、presence、fingerprint 或摘要,不能打印密码、完整订阅或生成配置。若 D601 到上游的 TLS/SNI 路径被 reset,不要用临时 JS 或简陋 HTTP CONNECT proxy 作为最终方案;通过 YAML/compose 更换或修复成熟加密 proxy source,再跑上面的 apply/validate/sync/validate 闭环。
|
||||
|
||||
## 镜像升级
|
||||
|
||||
1. 修改 `config/platform-infra/sub2api.yaml` 的 `image.repository`、`image.tag` 或 `pullPolicy`。
|
||||
@@ -142,7 +158,7 @@ Codex 启动时反复出现 WebSocket reconnect、HTTPS fallback、`websocket cl
|
||||
|
||||
## 受保护手动账号代理与分组绑定
|
||||
|
||||
Sub2API 管理 UI 的账号连接测试使用账号级 `ProxyID` / proxy URL 配置上游 HTTP transport;账号没有绑定 proxy 时会直接出站,即使 Sub2API Pod 已经有 `HTTP_PROXY` / `HTTPS_PROXY` 环境变量。看到 WebUI 账号测试连 `chatgpt.com` 超时、但 Pod 内显式走目标 proxy 可通时,先检查该账号是否属于 `manualAccounts.protected` 并声明了 `proxyBinding`。
|
||||
Sub2API 管理 UI 的账号连接测试使用账号级 `ProxyID` / proxy URL 配置上游 HTTP transport;账号没有绑定 proxy 时会直接出站,即使 Sub2API Pod 已经有 `HTTP_PROXY` / `HTTPS_PROXY` 环境变量。看到 WebUI 账号测试连 `chatgpt.com` 超时、但 Pod 内显式走目标 proxy 可通时,先检查该账号是否属于 `manualAccounts.protected` 并声明了 `proxyBinding`。如果同一账号用 `gpt-5.2-pro` 返回 ChatGPT OAuth 不支持 Codex 的模型能力错误,但默认/受支持模型能完成 `hi` 或 `/v1/responses` smoke,这不是代理失败;按模型映射/账号能力另行处理。
|
||||
|
||||
WebUI 账号连接测试也不经过统一消费 API key 的 pool group 选择器;账号测试正常不代表 PC Codex 客户端能选中该账号。看到 WebUI 账号测试正常、但 `/responses` 或 `/v1/responses` 以 `account-select-failed` / `no available accounts` 返回 503 时,先检查该手动账号是否声明了 `groupBinding.source: pool-group`,并通过 `sync --confirm` 加入当前 `pool.groupName`。
|
||||
|
||||
@@ -154,7 +170,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool sync --target D601 --confir
|
||||
bun scripts/cli.ts platform-infra sub2api codex-pool validate --target D601
|
||||
```
|
||||
|
||||
`sync` 输出应显示 `manualAccounts.ok=true`、`proxySync.ok=true`、`groupSync.ok=true`,且该账号的 proxy/group `bindingAligned=true`。`sentinel-probe --account <manual-account> --confirm` 对受保护手动账号必须继续拒绝,通常返回 `account-protected-manual`;不要为了测试而把该账号移入 `profiles.entries` 或取消保护。需要证明 WebUI 同款账号测试恢复时,用 Sub2API admin account test 原入口测最小 `hi` / `gpt-5.5`,并只记录 account id、proxy id、event types、HTTP status 和短 output preview,不记录 OAuth token 或 Secret 明文。
|
||||
`sync` 输出应显示 `manualAccounts.ok=true`、`proxySync.ok=true`、`groupSync.ok=true`,且该账号的 proxy/group `bindingAligned=true`。`sentinel-probe --account <manual-account> --confirm` 对受保护手动账号必须继续拒绝,通常返回 `account-protected-manual`;不要为了测试而把该账号移入 `profiles.entries` 或取消保护。需要证明 WebUI 同款账号测试恢复时,用 Sub2API admin account test 原入口测最小 `hi` 和默认/受支持模型,并只记录 account id、proxy id、event types、HTTP status 和短 output preview,不记录 OAuth token 或 Secret 明文。若指定模型返回 “model is not supported when using Codex with a ChatGPT account” 一类能力错误,先归因到模型能力/映射,而不是 proxy。
|
||||
|
||||
## 添加上游
|
||||
|
||||
@@ -233,7 +249,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm
|
||||
- 只加强监控、不让哨兵自动冻结账号时,把 YAML `sentinel.actions.enabled=false` 后 `codex-pool sync --confirm`。此时 marker probe 和 gateway failure monitor 仍记录 `would-freeze` / observe-only 证据,但不会通过 Sub2API admin 写 `schedulable=false`;`/responses/compact` 的 `codex.remote_compact.failed` 和 compact 上游 5xx failover 只作为 `gateway-compact-*` 观察事件记录,不作为哨兵自动切换触发器。
|
||||
- 单个 request id 报 502/503/中断/没有自动切号:第一步跑 `bun scripts/cli.ts platform-infra sub2api codex-pool trace --request-id <requestId>`。先看 `outcome`、`reason`、`FAILOVER`、`SELECT-FAILED`、`ACCOUNT SIGNALS` 和 `WINDOW STATS`;只有 trace 报表缺字段或需要审计原始日志时,才加 `--show-lines` 或 `--raw`。若 `reason=failover-attempted-no-candidate`,说明切号动作已发生,但 scheduler 在排除失败账号后没有可用候选;继续用 `sentinel-report` 和 `validate --full` 区分 sentinel quarantine、request-path temp-unschedulable、账号 status 或容量耗尽。
|
||||
- profile invalid:先修 `~/.codex/config.toml.<profile>` 的 `base_url`、`wire_api`、`model` 或 `auth.json.<profile>` 的 API key;不要在 YAML 中写密钥。
|
||||
- 手动 OAuth/API-key 账号的 WebUI account test 连 `chatgpt.com` 超时,但同一 Pod 显式 HTTP proxy 探针可通:不要只看 Pod `HTTP_PROXY` env,按“受保护手动账号代理与分组绑定”小节确认 `manualAccounts.protected[].proxyBinding`,跑 `codex-pool sync --target D601 --confirm` 后再用原账号测试复测。
|
||||
- 手动 OAuth/API-key 账号的 WebUI account test 连 `chatgpt.com` 超时,但同一 Pod 显式 HTTP proxy 探针可通:不要只看 Pod `HTTP_PROXY` env,按“受保护手动账号代理与分组绑定”小节确认 `manualAccounts.protected[].proxyBinding`,跑 `codex-pool sync --target D601 --confirm` 后再用原账号测试复测。若复测不再 reset/timeout,而是 `gpt-5.2-pro` 这类指定模型返回 ChatGPT OAuth Codex 不支持的能力错误,用默认/受支持模型或统一 key smoke 验证代理,不要把模型错误当作代理仍坏。
|
||||
- 手动 OAuth/API-key 账号 WebUI account test 正常,但 PC Codex 客户端通过统一 key 访问 `/responses` 返回 503 且 trace 是 `account-select-failed` / `no available accounts`:按“受保护手动账号代理与分组绑定”小节确认 `manualAccounts.protected[].groupBinding.source: pool-group`,跑 `codex-pool sync --target D601 --confirm` 后用 `codex-pool validate --target D601 --full` 复测统一 key。
|
||||
- Sub2API 卡在 `wait-postgres` / `wait-redis` 或服务内大量 `context deadline exceeded`:先跑 `sub2api status` 看 `networkPolicy.ok`,再跑 `sub2api validate` 看 `postgresCrossPodPgIsReady` / `redisCrossPodPing`;缺失或异常时用 `sub2api apply --confirm` 恢复受控 `NetworkPolicy/allow-all`,不要保留手工 iptables bypass 作为长期修复。
|
||||
- pool key 401:跑 `codex-pool sync --confirm` 重建 Sub2API key 与 k3s Secret 绑定,再跑 `codex-pool validate`。
|
||||
|
||||
@@ -121,7 +121,7 @@ For this failure class, the regression evidence must come from the real request
|
||||
|
||||
## Sub2API Account Test Semantics
|
||||
|
||||
Sub2API v0.1.136 has a separate management-plane account connection test. The admin WebUI account modal calls `POST /api/v1/admin/accounts/:id/test` with `model_id` and, for the admin account table modal, no OpenAI `mode`; the backend binds this to `AccountTestService.TestAccountConnection`, which normalizes an empty mode to `default`.
|
||||
Sub2API has a separate management-plane account connection test. The admin WebUI account modal calls `POST /api/v1/admin/accounts/:id/test` with `model_id` and, for the admin account table modal, no OpenAI `mode`; the backend binds this to `AccountTestService.TestAccountConnection`, which normalizes an empty mode to `default`.
|
||||
|
||||
For OpenAI API-key accounts in default mode, the test loads the account by id, applies `account.GetMappedModel(model_id)`, checks `openai_compat.ShouldUseResponsesAPI(account.Extra)`, and then builds an upstream URL from the account base URL with `/v1/responses`. It sends a direct upstream request through `httpUpstream.DoWithTLS` with `Content-Type: application/json` and `Authorization: Bearer <account-key>`. The request body is Responses API SSE, not a non-streaming JSON request: `model` is the mapped model, `input` is one user message whose text is `hi`, `stream` is `true`, and `instructions` is Sub2API's embedded OpenAI default instructions. For API-key accounts it does not set `store: false`, `max_output_tokens`, Codex CLI `User-Agent`, `OpenAI-Beta`, `Originator`, `Version`, `Session_ID`, or `Conversation_ID`; those Codex-like headers appear in other paths such as compact probing, not in the default account test.
|
||||
|
||||
@@ -129,7 +129,7 @@ The management test success criterion is transport and stream completion, not se
|
||||
|
||||
This management-plane test is also outside the normal consumer gateway scheduler. It fetches the account by id instead of listing only schedulable accounts, so `status=active` in the modal and a successful account test can coexist with `schedulable=false` in scheduler state. Because the test performs its own outbound `DoWithTLS` call, regular gateway access logs and usage logs may not contain the upstream account id/path/status evidence expected from ordinary `/v1/responses` traffic. When diagnosing account tests, use the management route semantics above or Sub2API source, not gateway access-log absence or an unrelated pool request as proof.
|
||||
|
||||
The management test uses Sub2API's account-level proxy selection, not the Pod environment as a fallback. In Sub2API v0.1.136 the upstream HTTP transport is configured from the account's `ProxyID` / proxy URL; an account with no proxy binding goes direct even if the Sub2API Pod has `HTTP_PROXY` or `HTTPS_PROXY` set. For protected manual accounts that need the target egress path, declare `manualAccounts.protected[].proxyBinding` in `config/platform-infra/sub2api-codex-pool.yaml` and reconcile it with `codex-pool sync --target <active> --confirm`; do not hand-patch the runtime account or infer proxy coverage from Pod env alone.
|
||||
The management test uses Sub2API's account-level proxy selection, not the Pod environment as a fallback. The upstream HTTP transport is configured from the account's `ProxyID` / proxy URL; an account with no proxy binding goes direct even if the Sub2API Pod has `HTTP_PROXY` or `HTTPS_PROXY` set. For protected manual accounts that need the target egress path, declare `manualAccounts.protected[].proxyBinding` in `config/platform-infra/sub2api-codex-pool.yaml` and reconcile it with `codex-pool sync --target <active> --confirm`; do not hand-patch the runtime account or infer proxy coverage from Pod env alone.
|
||||
|
||||
The management test is also not proof that the unified consumer key can select the account. A protected manual account must be attached to the pool group before ordinary `/responses` or `/v1/responses` traffic can use it. When that is intended, declare `manualAccounts.protected[].groupBinding.source: pool-group`; sync should add the account to the current `pool.groupName` without making it a YAML-managed profile or sentinel target.
|
||||
|
||||
@@ -163,7 +163,9 @@ The active Codex-pool request path follows the YAML-selected active target:
|
||||
|
||||
For the current D601 externally backed active target, client traffic reaches PK01 Caddy, PK01 forwards to the YAML-declared FRP remote port, D601 `sub2api-frpc` connects directly to PK01 `frps`, and FRP forwards to `sub2api.platform-infra.svc.cluster.local:8080` on D601. This path does not pass through the master server or the pikanode reverse proxy. `api.pikapython.com` must resolve to the YAML-declared PK01 public address before Caddy can obtain or renew the public certificate; when DNS is missing, PK01 local FRP probes and public-IP remote-port probes may prove the edge path, but they are not a substitute for final `https://api.pikapython.com` validation.
|
||||
|
||||
When target-level `egressProxy.enabled=true`, the D601 target renders an in-cluster HTTP(S) proxy client from the master VPN subscription source declared in YAML. The CLI injects the resulting proxy URL and `NO_PROXY` into Sub2API and, when requested by YAML, the Codex account sentinel. `platform-infra sub2api validate --target D601 --full` must prove the proxy Deployment/Service is ready and that an app pod can complete the YAML-declared health probe through the proxy. This target-level injection does not by itself bind manually created Sub2API accounts to that proxy; account tests and account-specific upstream transports still need a YAML-declared `manualAccounts.protected[].proxyBinding` when the account must avoid direct egress. Subscription contents and generated proxy configs are Secret material and must not be printed.
|
||||
When target-level `egressProxy.enabled=true`, the D601 target renders an in-cluster HTTP/mixed proxy client from the proxy source declared in YAML. The current mature external-egress shape is `sourceType: master-shadowsocks`: master Docker runs `shadowsocks-rust` from `config/platform-infra/sub2api-master-egress-proxy.compose.yaml`, while D601 runs `sing-box` to expose the ClusterIP proxy consumed by Sub2API and, when requested by YAML, the Codex account sentinel. A subscription-backed source is still just another YAML-declared source type; long-term prose must not duplicate the current endpoint, port, password, image tag, or health URL values from YAML/compose.
|
||||
|
||||
`platform-infra sub2api validate --target D601 --full` must prove the proxy Deployment/Service is ready and that an app pod can complete the YAML-declared health probe through the proxy. This target-level injection does not by itself bind manually created Sub2API accounts to that proxy; account tests and account-specific upstream transports still need a YAML-declared `manualAccounts.protected[].proxyBinding` when the account must avoid direct egress. Proxy credentials, subscription contents, and generated proxy configs are Secret material and must not be printed. If a direct D601-to-upstream TLS/SNI path is reset, do not leave a one-off plain HTTP CONNECT or JS proxy as the durable fix; use a mature encrypted proxy source, currently master `shadowsocks-rust` plus D601 `sing-box`, through YAML/compose.
|
||||
|
||||
Adding, removing, exposing, validating, and configuring local Codex consumers are daily operations covered by `$unidesk-sub2api`. The development rule is that ordinary pool membership changes stay YAML-only and do not add code or CI/CD. Code changes are only appropriate when UniDesk needs to render or validate a Sub2API capability that already exists upstream, such as account-level WebSocket mode or per-account upstream User-Agent. If Sub2API itself does not support a desired behavior, do not magic-patch it through UniDesk scripts, Kubernetes hotfixes, local forks, or hidden compatibility paths; either leave the behavior unsupported or pursue it upstream as an explicit Sub2API feature.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user