diff --git a/.agents/skills/unidesk-sub2api/SKILL.md b/.agents/skills/unidesk-sub2api/SKILL.md index 216182bb..85883bb8 100644 --- a/.agents/skills/unidesk-sub2api/SKILL.md +++ b/.agents/skills/unidesk-sub2api/SKILL.md @@ -112,7 +112,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --target D60 - `profiles.entries[].tempUnschedulable`: 可选 per-account Sub2API 内置临时不可调度覆盖;只用于明确偏离 pool 默认规则,不用它给某个账号特殊优先级或临时绕过通用 failover。 - `profiles.entries[].openaiResponsesWebSocketsV2Mode`: 需要 Responses WebSocket v2 的上游才设置,值为 `off`、`ctx_pool` 或 `passthrough`。 - `profiles.entries[].upstreamUserAgent`: 少数要求 Codex CLI User-Agent 的上游才设置,不能含换行。 -- `manualAccounts.protected`: 已在 Sub2API 手动创建/维护、且必须排除在 UniDesk-managed Codex pool 和 sentinel 控制之外的账号。默认不得改 credentials/status/schedulable/groups/priority/capacity/loadFactor;只有显式声明 `proxyBinding` 时,`sync --confirm` 才允许把该账号的 `proxy_id` 对齐到 YAML 目标的 egress proxy。 +- `manualAccounts.protected`: 已在 Sub2API 手动创建/维护、且必须排除在 UniDesk-managed Codex pool credentials 和 sentinel 控制之外的账号。默认不得改 credentials/status/schedulable/priority/capacity/loadFactor;只有显式声明 `proxyBinding` 时,`sync --confirm` 才允许把该账号的 `proxy_id` 对齐到 YAML 目标的 egress proxy;只有显式声明 `groupBinding.source: pool-group` 时,才允许把该账号加入统一消费 API key 使用的 pool group。 - `sentinel.monitor.enabled`: 账号级 marker 哨兵监控开关;开启后 `codex-pool sync --confirm` 会在 `platform-infra` 创建/更新 k8s CronJob、ConfigMap、Secret、ServiceAccount、Role 和 RoleBinding。CronJob 直打 YAML-managed 上游账号的 OpenAI Responses `gpt-5.5`,用确定 marker 作为唯一健康标准,并在独立 state ConfigMap 中记录 token/cost 账本。 - `sentinel.actions.enabled`: 账号级哨兵冻结/恢复动作开关;当前 marker-only guard 要求开启。动作关闭时只记录 `would-freeze`,不会调用 Sub2API admin API 改 `schedulable`。动作开启后,只要不满足 marker match,不论是 HTTP 200 私货、4xx/5xx、非 JSON、连接错误还是空输出,都进入同一个冻结/恢复状态机。 - `sentinel.sdk.openaiPythonVersion`: 哨兵容器使用的 OpenAI Python SDK 固定版本;模型请求必须通过标准 SDK `responses.create`,不要手工拼 `/v1/responses` 请求体或手写响应解析。后续升级 SDK 只改 YAML 并 `sync --confirm`。 @@ -122,7 +122,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --target D60 - `sentinel.freeze`: 失败冻结 TTL 指数退避配置。当前口径是初始 1 分钟,失败后 `1m -> 2m -> 4m -> 8m -> 10m`,最大 10 分钟;失败 probe 基本不消耗有效输出 token,因此冻结窗口保持短周期。冻结到期后只做恢复 probe,通过才自动恢复,不能仅靠 TTL 到期解封。 - `sentinel.pricing`: 直打上游时哨兵自己的 token/cost 估算价格。因为 direct upstream probe 不经过 Sub2API 普通用量账本,哨兵必须自己记录全局与 per-account token/cost;这些账本只用于观察,不作为跳过探测的预算门禁。 -`sync --confirm` 会登录 Sub2API admin、创建/更新 group、创建/更新 YAML 中的 `unidesk-codex-*` accounts、创建/复用统一 API key Secret,并部署/更新哨兵资源;它不把既有 managed account 直接恢复为 `schedulable=true`。恢复只由哨兵在读取 Sub2API runtime `schedulable=false` 后触发 recovery probe,并在 marker 命中时执行。`sync` 默认不删除 YAML 中缺席的 managed account。只有明确退役上游时才使用 `sync --confirm --prune-removed` 删除缺席且 `extra.unidesk_managed=true` 的 `unidesk-codex-*` account。对 `manualAccounts.protected`,`sync` 只执行 YAML 显式允许的窄同步;当前允许项是从目标 `egressProxy` 创建/更新 Sub2API internal proxy 记录,并把受保护手动账号的 `proxy_id` 绑定过去,不接管该账号凭据、调度、分组或哨兵状态。 +`sync --confirm` 会登录 Sub2API admin、创建/更新 group、创建/更新 YAML 中的 `unidesk-codex-*` accounts、创建/复用统一 API key Secret,并部署/更新哨兵资源;它不把既有 managed account 直接恢复为 `schedulable=true`。恢复只由哨兵在读取 Sub2API runtime `schedulable=false` 后触发 recovery probe,并在 marker 命中时执行。`sync` 默认不删除 YAML 中缺席的 managed account。只有明确退役上游时才使用 `sync --confirm --prune-removed` 删除缺席且 `extra.unidesk_managed=true` 的 `unidesk-codex-*` account。对 `manualAccounts.protected`,`sync` 只执行 YAML 显式允许的窄同步;当前允许项是从目标 `egressProxy` 创建/更新 Sub2API internal proxy 记录并绑定 `proxy_id`,以及把受保护手动账号加入当前 `pool.groupName`。它仍不接管该账号凭据、status、schedulable、priority/capacity/loadFactor 或哨兵状态。 `sentinel-image status|build` 管理哨兵 Python 运行环境镜像。镜像由 YAML 的 `sentinel.image` 基础镜像和 `sentinel.sdk.openaiPythonVersion` 派生,发布到目标 runtime 的本地 registry;`build --confirm` 会先检查 registry tag,存在则快速复用,不存在才在目标 host 构建并 push。CronJob 启动时只校验 SDK 版本,不在运行时 `pip install`。 @@ -140,11 +140,13 @@ WebSocket v2 是账号能力集合,不是调度 pin。`openaiResponsesWebSocke Codex 启动时反复出现 WebSocket reconnect、HTTPS fallback、`websocket closed by server before response.completed`,或 Sub2API 日志出现 `openai.websocket_proxy_failed` / `openai.websocket_account_select_failed` / 上游 WS handshake 4xx/5xx 时,先按运行证据定位具体 account 和 transport。若账号的 WSv2 握手失败,优先只在 YAML 中把该账号的 `openaiResponsesWebSocketsV2Mode` 收敛为 `off`;若没有任何 direct Codex WSv2 probe 通过,则同时把 `localCodex.supportsWebSockets` 与 `localCodex.responsesWebSocketsV2` 收敛为 `false`,再 `codex-pool sync --confirm`。不要顺手改 membership、priority、capacity、Secret 或代码 fallback。 -## 受保护手动账号代理绑定 +## 受保护手动账号代理与分组绑定 Sub2API 管理 UI 的账号连接测试使用账号级 `ProxyID` / proxy URL 配置上游 HTTP transport;账号没有绑定 proxy 时会直接出站,即使 Sub2API Pod 已经有 `HTTP_PROXY` / `HTTPS_PROXY` 环境变量。看到 WebUI 账号测试连 `chatgpt.com` 超时、但 Pod 内显式走目标 proxy 可通时,先检查该账号是否属于 `manualAccounts.protected` 并声明了 `proxyBinding`。 -受保护手动账号仍由人工在 Sub2API UI 维护 credentials/status 等字段;UniDesk 只允许通过 YAML 做代理窄绑定: +WebUI 账号连接测试也不经过统一消费 API key 的 pool group 选择器;账号测试正常不代表 PC Codex 客户端能选中该账号。看到 WebUI 账号测试正常、但 `/responses` 或 `/v1/responses` 以 `account-select-failed` / `no available accounts` 返回 503 时,先检查该手动账号是否声明了 `groupBinding.source: pool-group`,并通过 `sync --confirm` 加入当前 `pool.groupName`。 + +受保护手动账号仍由人工在 Sub2API UI 维护 credentials/status 等字段;UniDesk 只允许通过 YAML 做代理和分组窄绑定: ```bash bun scripts/cli.ts platform-infra sub2api codex-pool plan --target D601 @@ -152,7 +154,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool sync --target D601 --confir bun scripts/cli.ts platform-infra sub2api codex-pool validate --target D601 ``` -`sync` 输出应显示 `manualAccounts.ok=true`、`proxySync.ok=true` 和该账号 `bindingAligned=true`。`sentinel-probe --account --confirm` 对受保护手动账号必须继续拒绝,通常返回 `account-protected-manual`;不要为了测试而把该账号移入 `profiles.entries` 或取消保护。需要证明 WebUI 同款账号测试恢复时,用 Sub2API admin account test 原入口测最小 `hi` / `gpt-5.5`,并只记录 account id、proxy id、event types、HTTP status 和短 output preview,不记录 OAuth token 或 Secret 明文。 +`sync` 输出应显示 `manualAccounts.ok=true`、`proxySync.ok=true`、`groupSync.ok=true`,且该账号的 proxy/group `bindingAligned=true`。`sentinel-probe --account --confirm` 对受保护手动账号必须继续拒绝,通常返回 `account-protected-manual`;不要为了测试而把该账号移入 `profiles.entries` 或取消保护。需要证明 WebUI 同款账号测试恢复时,用 Sub2API admin account test 原入口测最小 `hi` / `gpt-5.5`,并只记录 account id、proxy id、event types、HTTP status 和短 output preview,不记录 OAuth token 或 Secret 明文。 ## 添加上游 @@ -231,7 +233,8 @@ bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm - 只加强监控、不让哨兵自动冻结账号时,把 YAML `sentinel.actions.enabled=false` 后 `codex-pool sync --confirm`。此时 marker probe 和 gateway failure monitor 仍记录 `would-freeze` / observe-only 证据,但不会通过 Sub2API admin 写 `schedulable=false`;`/responses/compact` 的 `codex.remote_compact.failed` 和 compact 上游 5xx failover 只作为 `gateway-compact-*` 观察事件记录,不作为哨兵自动切换触发器。 - 单个 request id 报 502/503/中断/没有自动切号:第一步跑 `bun scripts/cli.ts platform-infra sub2api codex-pool trace --request-id `。先看 `outcome`、`reason`、`FAILOVER`、`SELECT-FAILED`、`ACCOUNT SIGNALS` 和 `WINDOW STATS`;只有 trace 报表缺字段或需要审计原始日志时,才加 `--show-lines` 或 `--raw`。若 `reason=failover-attempted-no-candidate`,说明切号动作已发生,但 scheduler 在排除失败账号后没有可用候选;继续用 `sentinel-report` 和 `validate --full` 区分 sentinel quarantine、request-path temp-unschedulable、账号 status 或容量耗尽。 - profile invalid:先修 `~/.codex/config.toml.` 的 `base_url`、`wire_api`、`model` 或 `auth.json.` 的 API key;不要在 YAML 中写密钥。 -- 手动 OAuth/API-key 账号的 WebUI account test 连 `chatgpt.com` 超时,但同一 Pod 显式 HTTP proxy 探针可通:不要只看 Pod `HTTP_PROXY` env,按“受保护手动账号代理绑定”小节确认 `manualAccounts.protected[].proxyBinding`,跑 `codex-pool sync --target D601 --confirm` 后再用原账号测试复测。 +- 手动 OAuth/API-key 账号的 WebUI account test 连 `chatgpt.com` 超时,但同一 Pod 显式 HTTP proxy 探针可通:不要只看 Pod `HTTP_PROXY` env,按“受保护手动账号代理与分组绑定”小节确认 `manualAccounts.protected[].proxyBinding`,跑 `codex-pool sync --target D601 --confirm` 后再用原账号测试复测。 +- 手动 OAuth/API-key 账号 WebUI account test 正常,但 PC Codex 客户端通过统一 key 访问 `/responses` 返回 503 且 trace 是 `account-select-failed` / `no available accounts`:按“受保护手动账号代理与分组绑定”小节确认 `manualAccounts.protected[].groupBinding.source: pool-group`,跑 `codex-pool sync --target D601 --confirm` 后用 `codex-pool validate --target D601 --full` 复测统一 key。 - Sub2API 卡在 `wait-postgres` / `wait-redis` 或服务内大量 `context deadline exceeded`:先跑 `sub2api status` 看 `networkPolicy.ok`,再跑 `sub2api validate` 看 `postgresCrossPodPgIsReady` / `redisCrossPodPing`;缺失或异常时用 `sub2api apply --confirm` 恢复受控 `NetworkPolicy/allow-all`,不要保留手工 iptables bypass 作为长期修复。 - pool key 401:跑 `codex-pool sync --confirm` 重建 Sub2API key 与 k3s Secret 绑定,再跑 `codex-pool validate`。 - 运行中过去的验证探针残留:只用 `codex-pool cleanup-probes --confirm` 清理 `unidesk-probe-*` 临时资源;不要把真实 managed account 删除当作探针清理或可用性恢复。 diff --git a/config/platform-infra/sub2api-codex-pool.yaml b/config/platform-infra/sub2api-codex-pool.yaml index d6191622..133ae361 100644 --- a/config/platform-infra/sub2api-codex-pool.yaml +++ b/config/platform-infra/sub2api-codex-pool.yaml @@ -142,6 +142,9 @@ manualAccounts: enabled: true source: target-egress-proxy proxyName: platform-infra-sub2api-egress-proxy + groupBinding: + enabled: true + source: pool-group publicExposure: enabled: false proxyName: platform-infra-sub2api diff --git a/docs/reference/platform-infra.md b/docs/reference/platform-infra.md index b714ed3d..2a3ee850 100644 --- a/docs/reference/platform-infra.md +++ b/docs/reference/platform-infra.md @@ -99,7 +99,7 @@ - Codex account-state, quota prompts, model-routing failures, encrypted-content affinity failures, gateway wrappers, and timeout-like upstream errors must be handled by the generic temporary-unschedulable/failover path plus the external marker sentinel. Do not change membership, priority, capacity, load factor, WebSocket mode, `pool_mode`, or a specific provider's status merely to work around those errors. If a matching upstream failure still logs `openai.forward_failed` without `openai.upstream_failover_switching`, the missing fix is in Sub2API's HTTP `/responses` failover classification/error propagation, not in account pinning. - `profiles.entries[].openaiResponsesWebSocketsV2Mode` is the account-level Responses WebSocket v2 switch for OpenAI-compatible upstreams that require WebSocket transport. Allowed values are `off`, `ctx_pool`, and `passthrough`; omit the field unless that upstream needs it. - `profiles.entries[].upstreamUserAgent` is an optional account-level upstream request User-Agent override. Use it only for upstreams that require a Codex CLI compatible User-Agent; keep the value YAML-controlled and newline-free. -- `manualAccounts.protected` declares Sub2API accounts that were created or edited manually and must stay outside UniDesk-managed Codex pool credentials, scheduler policy, and sentinel control. The only allowed reconciliation for such an account is an explicitly declared narrow capability such as `proxyBinding`, which may align the account's Sub2API `proxy_id` to the YAML-selected target egress proxy. `codex-pool sync --confirm` must not rewrite protected account credentials, status, schedulability, groups, priority, capacity, load factor, or sentinel state, and `sentinel-probe --account ...` must refuse protected manual accounts. +- `manualAccounts.protected` declares Sub2API accounts that were created or edited manually and must stay outside UniDesk-managed Codex pool credentials, scheduler policy, and sentinel control. The only allowed reconciliation for such an account is an explicitly declared narrow capability such as `proxyBinding`, which may align the account's Sub2API `proxy_id` to the YAML-selected target egress proxy, or `groupBinding`, which may attach the account to the YAML-selected pool group so the unified consumer key can use it. `codex-pool sync --confirm` must not rewrite protected account credentials, status, schedulability, priority, capacity, load factor, or sentinel state, and `sentinel-probe --account ...` must refuse protected manual accounts. - `publicExposure` in `config/platform-infra/sub2api-codex-pool.yaml` controls the legacy Codex-pool public bridge from master server to the G14 ClusterIP service and should stay disabled unless that bridge is explicitly reintroduced. Target-level `publicExposure` in `config/platform-infra/sub2api.yaml` controls the active public edge such as D601-to-PK01. - `publicExposure.masterCaddy.responseHeaderTimeoutSeconds` controls the master Caddy `response_header_timeout` for the public Sub2API site. It must be long enough for Codex `/responses/compact` requests; otherwise Caddy can return a client-visible 504 before Sub2API finishes the upstream compact request, and that edge timeout is not an account-level upstream failure that Sub2API can use for temporary-unschedulable failover. The numeric value belongs only in `config/platform-infra/sub2api-codex-pool.yaml`; after changing it, use `codex-pool expose --confirm` to reload Caddy and verify the rendered `response_header_timeout`. Requests that were already in flight before the reload may still finish with the previous timeout, so post-change evidence should check only requests that started after the reload. - `publicExposure.masterCaddy.edgeRetry` controls the master Caddy reverse-proxy retry window for the public Sub2API site. This belongs at the edge because FRP remotePort listener loss, `connection refused`, EOF, or connection reset can happen before a request reaches Sub2API, so Sub2API account failover and sentinel logic cannot observe or recover that request. Keep retry scope narrow, especially for non-idempotent POST traffic: connection-attempt failures may be retried by the reverse proxy, while round-trip retry after an upstream connection was established should be limited by YAML `retryMatch` to paths that are safe to repeat, such as compact. Retry durations and intervals belong only in YAML; after changing them, run `codex-pool expose --confirm` and verify the rendered Caddyfile contains the expected `lb_try_duration`, `lb_try_interval`, and `lb_retry_match`. @@ -131,6 +131,8 @@ This management-plane test is also outside the normal consumer gateway scheduler The management test uses Sub2API's account-level proxy selection, not the Pod environment as a fallback. In Sub2API v0.1.136 the upstream HTTP transport is configured from the account's `ProxyID` / proxy URL; an account with no proxy binding goes direct even if the Sub2API Pod has `HTTP_PROXY` or `HTTPS_PROXY` set. For protected manual accounts that need the target egress path, declare `manualAccounts.protected[].proxyBinding` in `config/platform-infra/sub2api-codex-pool.yaml` and reconcile it with `codex-pool sync --target --confirm`; do not hand-patch the runtime account or infer proxy coverage from Pod env alone. +The management test is also not proof that the unified consumer key can select the account. A protected manual account must be attached to the pool group before ordinary `/responses` or `/v1/responses` traffic can use it. When that is intended, declare `manualAccounts.protected[].groupBinding.source: pool-group`; sync should add the account to the current `pool.groupName` without making it a YAML-managed profile or sentinel target. + An external account-level sentinel that wants parity with this WebUI path should reuse the same request shape as far as the standard OpenAI SDK allows: direct account credentials, Responses API, `stream=true`, no `store: false` for API-key accounts, no upstream `max_output_tokens` field, and success parsing based on the streaming events. A local stream delta collection limit is acceptable as a sentinel safety bound, but it should not change the upstream request body. The sentinel may replace the user text `hi` with a marker prompt, but it should not introduce extra request fields or Codex/compact headers merely for convenience. If a marker-only sentinel intentionally diverges from the management test shape, the divergence must be documented in probe output so a WebUI success and sentinel failure are not misread as operator error. ## Account Sentinel Marker Contract diff --git a/scripts/src/platform-infra-sub2api-codex.ts b/scripts/src/platform-infra-sub2api-codex.ts index 254bf9fa..a247f7b7 100644 --- a/scripts/src/platform-infra-sub2api-codex.ts +++ b/scripts/src/platform-infra-sub2api-codex.ts @@ -164,10 +164,16 @@ interface CodexPoolManualAccountProxyBinding { proxyName: string; } +interface CodexPoolManualAccountGroupBinding { + enabled: boolean; + source: "pool-group"; +} + interface CodexPoolManualAccountProtection { accountName: string; reason: string | null; proxyBinding: CodexPoolManualAccountProxyBinding | null; + groupBinding: CodexPoolManualAccountGroupBinding | null; } interface CodexPoolProfileConfig { @@ -703,11 +709,11 @@ function codexPoolPlan(options?: DisclosureOptions): Record { : runtimeTarget.publicBaseUrl === null ? "Public FRP exposure is disabled by YAML." : `Legacy Codex-pool FRP exposure is disabled by YAML; Codex consumers for target ${runtimeTarget.id} use target-level public exposure ${consumerBaseUrl}.`, - idempotency: "sync reuses the group, account names, and k3s Secret when they already exist; credentials are updated from the current local Codex files; managed accounts missing from YAML are preserved unless --prune-removed is explicitly provided.", + idempotency: "sync reuses the group, account names, and k3s Secret when they already exist; credentials are updated from the current local Codex files for YAML-managed profiles only; managed accounts missing from YAML are preserved unless --prune-removed is explicitly provided.", configPolicy: "UniDesk-owned durable configuration remains YAML-first; local ~/.codex files and runtime Secrets are not committed.", manualAccountProtection: pool.manualAccounts.protected.length === 0 ? "No manual Sub2API accounts are protected by YAML." - : `${pool.manualAccounts.protected.length} manual Sub2API account(s) are protected from UniDesk-managed sync, prune, sentinel probe, and sentinel freeze paths.`, + : `${pool.manualAccounts.protected.length} manual Sub2API account(s) are protected from UniDesk-managed credentials, prune, sentinel probe, and sentinel freeze paths; only explicitly declared proxy/group bindings are reconciled.`, }, next: ok ? { sync: `bun scripts/cli.ts platform-infra sub2api codex-pool sync${targetFlag(runtimeTarget)} --confirm` } @@ -1520,7 +1526,8 @@ function readManualAccountsConfig(value: unknown, defaults: CodexPoolManualAccou seen.add(normalized); const reason = isRecord(entry) ? readManualAccountReason(entry.reason, `${key}.reason`) : null; const proxyBinding = isRecord(entry) ? readManualAccountProxyBinding(entry.proxyBinding, `${key}.proxyBinding`) : null; - return { accountName, reason, proxyBinding }; + const groupBinding = isRecord(entry) ? readManualAccountGroupBinding(entry.groupBinding, `${key}.groupBinding`) : null; + return { accountName, reason, proxyBinding, groupBinding }; }); return { protected: protectedAccounts }; } @@ -1541,6 +1548,18 @@ function readManualAccountProxyBinding(value: unknown, key: string): CodexPoolMa }; } +function readManualAccountGroupBinding(value: unknown, key: string): CodexPoolManualAccountGroupBinding | null { + if (value === undefined || value === null) return null; + if (!isRecord(value)) throw new Error(`${codexPoolConfigPath}.${key} must be a YAML object`); + const enabled = value.enabled === undefined ? true : value.enabled === true; + const source = stringValue(value.source) ?? "pool-group"; + if (source !== "pool-group") throw new Error(`${codexPoolConfigPath}.${key}.source must be pool-group`); + return { + enabled, + source, + }; +} + function readManualAccountName(value: unknown, key: string): string | null { const text = stringValue(value)?.trim() ?? null; if (text === null || text.length === 0) return null; @@ -2051,7 +2070,7 @@ function codexPoolConfigSummary(pool: CodexPoolConfig): Record manualAccounts: { protectedCount: pool.manualAccounts.protected.length, protected: pool.manualAccounts.protected, - controlPolicy: "manual accounts are not created, updated, pruned, probed, or frozen by UniDesk codex-pool sync/sentinel", + controlPolicy: "manual accounts are not created, credential-updated, pruned, probed, or frozen by UniDesk codex-pool sync/sentinel; optional proxy_id and pool group membership bindings are narrow YAML-controlled exceptions", }, publicExposure: publicExposureSummary(pool), localCodex: pool.localCodex, @@ -2158,6 +2177,7 @@ function compactManualAccounts(block: unknown): Record | null { "inYamlProfiles", "runtimeMarkedUnideskManaged", "proxyBinding", + "groupBinding", "controlPolicy", ])); const proxySync = isRecord(block.proxySync) @@ -2181,11 +2201,31 @@ function compactManualAccounts(block: unknown): Record | null { valuesPrinted: false, } : undefined; + const groupSync = isRecord(block.groupSync) + ? { + ok: block.groupSync.ok, + itemCount: block.groupSync.itemCount, + items: recordArray(block.groupSync.items).map((item) => pickSummaryFields(item, [ + "accountName", + "accountId", + "enabled", + "ok", + "action", + "source", + "poolGroupName", + "poolGroupId", + "bindingAligned", + "controlPolicy", + ])), + valuesPrinted: false, + } + : undefined; return { ok: block.ok, protectedCount: block.protectedCount, items, proxySync, + groupSync, valuesPrinted: false, }; } @@ -4494,9 +4534,12 @@ def group_payload(): "rpm_limit": 0, } +def list_groups(token): + data = ensure_success(curl_api("GET", "/api/v1/admin/groups/all?platform=openai", bearer=token), "list groups") + return extract_items(data) + def ensure_group(token): - existing_data = ensure_success(curl_api("GET", "/api/v1/admin/groups/all?platform=openai", bearer=token), "list groups") - existing = next((item for item in extract_items(existing_data) if item.get("name") == POOL_GROUP_NAME), None) + existing = next((item for item in list_groups(token) if item.get("name") == POOL_GROUP_NAME), None) payload = group_payload() if existing is None: created = ensure_success(curl_api("POST", "/api/v1/admin/groups", bearer=token, payload=payload), "create group") @@ -4508,6 +4551,26 @@ def ensure_group(token): updated = ensure_success(curl_api("PUT", f"/api/v1/admin/groups/{group_id}", bearer=token, payload=payload), "update group") return updated if isinstance(updated, dict) else existing, "updated" +def list_accounts_for_group(token, group_id): + path = f"/api/v1/admin/accounts?group_id={group_id}&page=1&page_size=500&platform=openai" + data = ensure_success(curl_api("GET", path, bearer=token), f"list accounts for group {group_id}") + return extract_items(data) + +def account_group_ids(token, account): + if not isinstance(account, dict) or account.get("id") is None: + return [] + account_id = account.get("id") + account_name = account.get("name") + ids = [] + for group in list_groups(token): + group_id = group.get("id") if isinstance(group, dict) else None + if group_id is None: + continue + members = list_accounts_for_group(token, group_id) + if any(item.get("id") == account_id or item.get("name") == account_name for item in members if isinstance(item, dict)): + ids.append(group_id) + return sorted(set(ids)) + def list_accounts(token): path = "/api/v1/admin/accounts?page=1&page_size=200&platform=openai&type=apikey&search=" + quote("unidesk-codex-") data = ensure_success(curl_api("GET", path, bearer=token), "list accounts") @@ -4689,7 +4752,119 @@ def ensure_manual_account_proxy_bindings(token): "valuesPrinted": False, } -def manual_account_protection_status(token): +def manual_group_binding_enabled(protection): + binding = protection.get("groupBinding") if isinstance(protection, dict) else None + if not isinstance(binding, dict) or binding.get("enabled") is not True: + return False + if binding.get("source") != "pool-group": + raise RuntimeError("manual account groupBinding source must be pool-group") + return True + +def manual_group_status(token, account, protection, group_id): + enabled = manual_group_binding_enabled(protection) + if not enabled: + return { + "enabled": False, + "ok": True, + "action": "not-configured", + "valuesPrinted": False, + } + group_accounts = list_accounts_for_group(token, group_id) + account_id = account.get("id") if isinstance(account, dict) else None + account_name = account.get("name") if isinstance(account, dict) else None + binding_aligned = any( + item.get("id") == account_id or item.get("name") == account_name + for item in group_accounts + if isinstance(item, dict) + ) + return { + "enabled": True, + "ok": binding_aligned, + "action": "validate", + "source": "pool-group", + "poolGroupName": POOL_GROUP_NAME, + "poolGroupId": group_id, + "bindingAligned": binding_aligned, + "valuesPrinted": False, + } + +def ensure_manual_account_group_bindings(token, group_id): + items = [] + for protection in MANUAL_ACCOUNT_PROTECTIONS: + if not isinstance(protection, dict): + continue + name = protection.get("accountName") + if not isinstance(name, str) or not name: + continue + if not manual_group_binding_enabled(protection): + items.append({ + "accountName": name, + "enabled": False, + "action": "not-configured", + "ok": True, + "valuesPrinted": False, + }) + continue + account = find_account_by_name(token, name) + if not isinstance(account, dict): + items.append({ + "accountName": name, + "enabled": True, + "action": "account-missing", + "ok": False, + "poolGroupName": POOL_GROUP_NAME, + "poolGroupId": group_id, + "valuesPrinted": False, + }) + continue + extra = account.get("extra") if isinstance(account.get("extra"), dict) else {} + if extra.get("unidesk_managed") is True: + items.append({ + "accountName": name, + "accountId": account.get("id"), + "enabled": True, + "action": "refused-managed-runtime-account", + "ok": False, + "poolGroupName": POOL_GROUP_NAME, + "poolGroupId": group_id, + "valuesPrinted": False, + }) + continue + existing_group_ids = account_group_ids(token, account) + desired_group_ids = sorted(set(existing_group_ids + [group_id])) + action = "unchanged" + if group_id not in existing_group_ids: + updated = ensure_success(curl_api("PUT", f"/api/v1/admin/accounts/{account['id']}", bearer=token, payload={"group_ids": desired_group_ids}), f"bind manual account group {name}") + account = updated if isinstance(updated, dict) else account + action = "bound" + binding_aligned = any( + item.get("id") == account.get("id") or item.get("name") == name + for item in list_accounts_for_group(token, group_id) + if isinstance(item, dict) + ) + items.append({ + "accountName": name, + "accountId": account.get("id"), + "enabled": True, + "ok": binding_aligned, + "action": action, + "source": "pool-group", + "poolGroupName": POOL_GROUP_NAME, + "poolGroupId": group_id, + "previousGroupIds": existing_group_ids, + "desiredGroupIds": desired_group_ids, + "bindingAligned": binding_aligned, + "controlPolicy": "manual-protected: only pool group membership is YAML-controlled; credentials/status/schedulable are untouched and sentinel does not probe it", + "valuesPrinted": False, + }) + return { + "ok": all(item.get("ok") is True for item in items), + "itemCount": len(items), + "items": items, + "valuesPrinted": False, + } + +def manual_account_protection_status(token, group_id=None): items = [] desired_names = set(EXPECTED_ACCOUNT_CAPACITIES.keys()) for protection in MANUAL_ACCOUNT_PROTECTIONS: @@ -4701,6 +4876,7 @@ def manual_account_protection_status(token): account = find_account_by_name(token, name) extra = account.get("extra") if isinstance(account, dict) and isinstance(account.get("extra"), dict) else {} proxy_status = manual_proxy_status(token, account, protection) + group_status = manual_group_status(token, account, protection, group_id) if group_id is not None else {"enabled": False, "ok": True, "action": "not-checked", "valuesPrinted": False} items.append({ "accountName": name, "reason": protection.get("reason") if isinstance(protection.get("reason"), str) else None, @@ -4711,8 +4887,9 @@ def manual_account_protection_status(token): "inYamlProfiles": name in desired_names, "runtimeMarkedUnideskManaged": extra.get("unidesk_managed") is True, "proxyBinding": proxy_status, - "ok": proxy_status.get("ok") is True, - "controlPolicy": "manual-protected: no create/update/prune/probe/freeze; optional proxy_id binding only when proxyBinding is configured", + "groupBinding": group_status, + "ok": proxy_status.get("ok") is True and group_status.get("ok") is True, + "controlPolicy": "manual-protected: no create/update/prune/probe/freeze; optional proxy_id and pool group membership binding only when configured", "valuesPrinted": False, }) return { @@ -6334,7 +6511,8 @@ def run_sync(): protected_frozen_names = active_sentinel_quarantine_names() account_results, pruned_account_results = ensure_accounts(token, profiles, group_id, prune_removed, protected_frozen_names, existing_accounts) manual_account_proxy_bindings = ensure_manual_account_proxy_bindings(token) - manual_account_protections = manual_account_protection_status(token) + manual_account_group_bindings = ensure_manual_account_group_bindings(token, group_id) + manual_account_protections = manual_account_protection_status(token, group_id) capacity_status = account_capacity_status(token) load_factor_status = account_load_factor_status(token) ws_v2_status = account_ws_v2_status(token) @@ -6352,7 +6530,7 @@ def run_sync(): sentinel_quality = ensure_sentinel_state_for_sync(account_results) sentinel_reassert = reassert_sentinel_freezes_after_sync(token) return { - "ok": gateway["ok"] is True and responses_smoke["ok"] is True and owner_concurrency["ok"] is True and capacity_status["ok"] is True and load_factor_status["ok"] is True and ws_v2_status["ok"] is True and temp_unschedulable_status["ok"] is True and manual_account_proxy_bindings.get("ok") is True and manual_account_protections.get("ok") is True and sentinel.get("ok") is True and sentinel_quality_prepare.get("ok") is True and sentinel_quality.get("ok") is True and sentinel_reassert.get("ok") is True and runtime_capabilities.get("ok") is True, + "ok": gateway["ok"] is True and responses_smoke["ok"] is True and owner_concurrency["ok"] is True and capacity_status["ok"] is True and load_factor_status["ok"] is True and ws_v2_status["ok"] is True and temp_unschedulable_status["ok"] is True and manual_account_proxy_bindings.get("ok") is True and manual_account_group_bindings.get("ok") is True and manual_account_protections.get("ok") is True and sentinel.get("ok") is True and sentinel_quality_prepare.get("ok") is True and sentinel_quality.get("ok") is True and sentinel_reassert.get("ok") is True and runtime_capabilities.get("ok") is True, "degraded": bool(responses_smoke.get("degraded")) or bool(compact_evidence.get("degraded")) or bool(responses_evidence.get("degraded")) or runtime_capabilities.get("ok") is not True, "mode": "sync", "namespace": NAMESPACE, @@ -6371,7 +6549,7 @@ def run_sync(): "processControl": {"schedulableRestore": "sentinel marker probe only; sync does not restore schedulable for existing accounts", "durableConfig": False}, "valuesPrinted": False, }, - "manualAccounts": {**manual_account_protections, "proxySync": manual_account_proxy_bindings}, + "manualAccounts": {**manual_account_protections, "proxySync": manual_account_proxy_bindings, "groupSync": manual_account_group_bindings}, "capacity": capacity_status, "loadFactor": load_factor_status, "webSocketsV2": ws_v2_status, @@ -6410,7 +6588,8 @@ def run_validate(): load_factor_status = account_load_factor_status(token) ws_v2_status = account_ws_v2_status(token) temp_unschedulable_status = account_temp_unschedulable_status(token) - manual_account_protections = manual_account_protection_status(token) + pool_group_id = key_item.get("group_id") if isinstance(key_item, dict) else None + manual_account_protections = manual_account_protection_status(token, pool_group_id) gateway = validate_gateway(api_key) responses_smoke = validate_gateway_responses(api_key) compact_evidence = recent_compact_gateway_evidence() @@ -6429,6 +6608,7 @@ def run_validate(): "secret": f"{NAMESPACE}/{POOL_API_KEY_SECRET_NAME}.{POOL_API_KEY_SECRET_KEY}", "sub2apiId": key_item.get("id") if isinstance(key_item, dict) else None, "userId": key_item.get("user_id") if isinstance(key_item, dict) else None, + "groupId": key_item.get("group_id") if isinstance(key_item, dict) else None, "keyPreview": api_key_preview(api_key), "valuesPrinted": False, },