fix: bind protected sub2api manual accounts to pool group

2026-06-14 14:55:28 +00:00
parent 16e7284bdb
commit d6638655cc
4 changed files with 208 additions and 20 deletions
@@ -112,7 +112,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --target D60
 - `profiles.entries[].tempUnschedulable`: 可选 per-account Sub2API 内置临时不可调度覆盖；只用于明确偏离 pool 默认规则，不用它给某个账号特殊优先级或临时绕过通用 failover。
 - `profiles.entries[].openaiResponsesWebSocketsV2Mode`: 需要 Responses WebSocket v2 的上游才设置，值为 `off`、`ctx_pool` 或 `passthrough`。
 - `profiles.entries[].upstreamUserAgent`: 少数要求 Codex CLI User-Agent 的上游才设置，不能含换行。
- `manualAccounts.protected`: 已在 Sub2API 手动创建/维护、且必须排除在 UniDesk-managed Codex pool 和 sentinel 控制之外的账号。默认不得改 credentials/status/schedulable/groups/priority/capacity/loadFactor；只有显式声明 `proxyBinding` 时，`sync --confirm` 才允许把该账号的 `proxy_id` 对齐到 YAML 目标的 egress proxy。
+- `manualAccounts.protected`: 已在 Sub2API 手动创建/维护、且必须排除在 UniDesk-managed Codex pool credentials 和 sentinel 控制之外的账号。默认不得改 credentials/status/schedulable/priority/capacity/loadFactor；只有显式声明 `proxyBinding` 时，`sync --confirm` 才允许把该账号的 `proxy_id` 对齐到 YAML 目标的 egress proxy；只有显式声明 `groupBinding.source: pool-group` 时，才允许把该账号加入统一消费 API key 使用的 pool group。
 - `sentinel.monitor.enabled`: 账号级 marker 哨兵监控开关；开启后 `codex-pool sync --confirm` 会在 `platform-infra` 创建/更新 k8s CronJob、ConfigMap、Secret、ServiceAccount、Role 和 RoleBinding。CronJob 直打 YAML-managed 上游账号的 OpenAI Responses `gpt-5.5`，用确定 marker 作为唯一健康标准，并在独立 state ConfigMap 中记录 token/cost 账本。
 - `sentinel.actions.enabled`: 账号级哨兵冻结/恢复动作开关；当前 marker-only guard 要求开启。动作关闭时只记录 `would-freeze`，不会调用 Sub2API admin API 改 `schedulable`。动作开启后，只要不满足 marker match，不论是 HTTP 200 私货、4xx/5xx、非 JSON、连接错误还是空输出，都进入同一个冻结/恢复状态机。
 - `sentinel.sdk.openaiPythonVersion`: 哨兵容器使用的 OpenAI Python SDK 固定版本；模型请求必须通过标准 SDK `responses.create`，不要手工拼 `/v1/responses` 请求体或手写响应解析。后续升级 SDK 只改 YAML 并 `sync --confirm`。
@@ -122,7 +122,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --target D60
 - `sentinel.freeze`: 失败冻结 TTL 指数退避配置。当前口径是初始 1 分钟，失败后 `1m -> 2m -> 4m -> 8m -> 10m`，最大 10 分钟；失败 probe 基本不消耗有效输出 token，因此冻结窗口保持短周期。冻结到期后只做恢复 probe，通过才自动恢复，不能仅靠 TTL 到期解封。
 - `sentinel.pricing`: 直打上游时哨兵自己的 token/cost 估算价格。因为 direct upstream probe 不经过 Sub2API 普通用量账本，哨兵必须自己记录全局与 per-account token/cost；这些账本只用于观察，不作为跳过探测的预算门禁。

-`sync --confirm` 会登录 Sub2API admin、创建/更新 group、创建/更新 YAML 中的 `unidesk-codex-*` accounts、创建/复用统一 API key Secret，并部署/更新哨兵资源；它不把既有 managed account 直接恢复为 `schedulable=true`。恢复只由哨兵在读取 Sub2API runtime `schedulable=false` 后触发 recovery probe，并在 marker 命中时执行。`sync` 默认不删除 YAML 中缺席的 managed account。只有明确退役上游时才使用 `sync --confirm --prune-removed` 删除缺席且 `extra.unidesk_managed=true` 的 `unidesk-codex-*` account。对 `manualAccounts.protected`，`sync` 只执行 YAML 显式允许的窄同步；当前允许项是从目标 `egressProxy` 创建/更新 Sub2API internal proxy 记录，并把受保护手动账号的 `proxy_id` 绑定过去，不接管该账号凭据、调度、分组或哨兵状态。
+`sync --confirm` 会登录 Sub2API admin、创建/更新 group、创建/更新 YAML 中的 `unidesk-codex-*` accounts、创建/复用统一 API key Secret，并部署/更新哨兵资源；它不把既有 managed account 直接恢复为 `schedulable=true`。恢复只由哨兵在读取 Sub2API runtime `schedulable=false` 后触发 recovery probe，并在 marker 命中时执行。`sync` 默认不删除 YAML 中缺席的 managed account。只有明确退役上游时才使用 `sync --confirm --prune-removed` 删除缺席且 `extra.unidesk_managed=true` 的 `unidesk-codex-*` account。对 `manualAccounts.protected`，`sync` 只执行 YAML 显式允许的窄同步；当前允许项是从目标 `egressProxy` 创建/更新 Sub2API internal proxy 记录并绑定 `proxy_id`，以及把受保护手动账号加入当前 `pool.groupName`。它仍不接管该账号凭据、status、schedulable、priority/capacity/loadFactor 或哨兵状态。

 `sentinel-image status|build` 管理哨兵 Python 运行环境镜像。镜像由 YAML 的 `sentinel.image` 基础镜像和 `sentinel.sdk.openaiPythonVersion` 派生，发布到目标 runtime 的本地 registry；`build --confirm` 会先检查 registry tag，存在则快速复用，不存在才在目标 host 构建并 push。CronJob 启动时只校验 SDK 版本，不在运行时 `pip install`。

@@ -140,11 +140,13 @@ WebSocket v2 是账号能力集合，不是调度 pin。`openaiResponsesWebSocke

 Codex 启动时反复出现 WebSocket reconnect、HTTPS fallback、`websocket closed by server before response.completed`，或 Sub2API 日志出现 `openai.websocket_proxy_failed` / `openai.websocket_account_select_failed` / 上游 WS handshake 4xx/5xx 时，先按运行证据定位具体 account 和 transport。若账号的 WSv2 握手失败，优先只在 YAML 中把该账号的 `openaiResponsesWebSocketsV2Mode` 收敛为 `off`；若没有任何 direct Codex WSv2 probe 通过，则同时把 `localCodex.supportsWebSockets` 与 `localCodex.responsesWebSocketsV2` 收敛为 `false`，再 `codex-pool sync --confirm`。不要顺手改 membership、priority、capacity、Secret 或代码 fallback。

-## 受保护手动账号代理绑定
+## 受保护手动账号代理与分组绑定

 Sub2API 管理 UI 的账号连接测试使用账号级 `ProxyID` / proxy URL 配置上游 HTTP transport；账号没有绑定 proxy 时会直接出站，即使 Sub2API Pod 已经有 `HTTP_PROXY` / `HTTPS_PROXY` 环境变量。看到 WebUI 账号测试连 `chatgpt.com` 超时、但 Pod 内显式走目标 proxy 可通时，先检查该账号是否属于 `manualAccounts.protected` 并声明了 `proxyBinding`。

-受保护手动账号仍由人工在 Sub2API UI 维护 credentials/status 等字段；UniDesk 只允许通过 YAML 做代理窄绑定：
+WebUI 账号连接测试也不经过统一消费 API key 的 pool group 选择器；账号测试正常不代表 PC Codex 客户端能选中该账号。看到 WebUI 账号测试正常、但 `/responses` 或 `/v1/responses` 以 `account-select-failed` / `no available accounts` 返回 503 时，先检查该手动账号是否声明了 `groupBinding.source: pool-group`，并通过 `sync --confirm` 加入当前 `pool.groupName`。
+
+受保护手动账号仍由人工在 Sub2API UI 维护 credentials/status 等字段；UniDesk 只允许通过 YAML 做代理和分组窄绑定：

 ```bash
 bun scripts/cli.ts platform-infra sub2api codex-pool plan --target D601
@@ -152,7 +154,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool sync --target D601 --confir
 bun scripts/cli.ts platform-infra sub2api codex-pool validate --target D601
 ```

-`sync` 输出应显示 `manualAccounts.ok=true`、`proxySync.ok=true` 和该账号 `bindingAligned=true`。`sentinel-probe --account <manual-account> --confirm` 对受保护手动账号必须继续拒绝，通常返回 `account-protected-manual`；不要为了测试而把该账号移入 `profiles.entries` 或取消保护。需要证明 WebUI 同款账号测试恢复时，用 Sub2API admin account test 原入口测最小 `hi` / `gpt-5.5`，并只记录 account id、proxy id、event types、HTTP status 和短 output preview，不记录 OAuth token 或 Secret 明文。
+`sync` 输出应显示 `manualAccounts.ok=true`、`proxySync.ok=true`、`groupSync.ok=true`，且该账号的 proxy/group `bindingAligned=true`。`sentinel-probe --account <manual-account> --confirm` 对受保护手动账号必须继续拒绝，通常返回 `account-protected-manual`；不要为了测试而把该账号移入 `profiles.entries` 或取消保护。需要证明 WebUI 同款账号测试恢复时，用 Sub2API admin account test 原入口测最小 `hi` / `gpt-5.5`，并只记录 account id、proxy id、event types、HTTP status 和短 output preview，不记录 OAuth token 或 Secret 明文。

 ## 添加上游

@@ -231,7 +233,8 @@ bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm
 - 只加强监控、不让哨兵自动冻结账号时，把 YAML `sentinel.actions.enabled=false` 后 `codex-pool sync --confirm`。此时 marker probe 和 gateway failure monitor 仍记录 `would-freeze` / observe-only 证据，但不会通过 Sub2API admin 写 `schedulable=false`；`/responses/compact` 的 `codex.remote_compact.failed` 和 compact 上游 5xx failover 只作为 `gateway-compact-*` 观察事件记录，不作为哨兵自动切换触发器。
 - 单个 request id 报 502/503/中断/没有自动切号：第一步跑 `bun scripts/cli.ts platform-infra sub2api codex-pool trace --request-id <requestId>`。先看 `outcome`、`reason`、`FAILOVER`、`SELECT-FAILED`、`ACCOUNT SIGNALS` 和 `WINDOW STATS`；只有 trace 报表缺字段或需要审计原始日志时，才加 `--show-lines` 或 `--raw`。若 `reason=failover-attempted-no-candidate`，说明切号动作已发生，但 scheduler 在排除失败账号后没有可用候选；继续用 `sentinel-report` 和 `validate --full` 区分 sentinel quarantine、request-path temp-unschedulable、账号 status 或容量耗尽。
 - profile invalid：先修 `~/.codex/config.toml.<profile>` 的 `base_url`、`wire_api`、`model` 或 `auth.json.<profile>` 的 API key；不要在 YAML 中写密钥。
- 手动 OAuth/API-key 账号的 WebUI account test 连 `chatgpt.com` 超时，但同一 Pod 显式 HTTP proxy 探针可通：不要只看 Pod `HTTP_PROXY` env，按“受保护手动账号代理绑定”小节确认 `manualAccounts.protected[].proxyBinding`，跑 `codex-pool sync --target D601 --confirm` 后再用原账号测试复测。
+- 手动 OAuth/API-key 账号的 WebUI account test 连 `chatgpt.com` 超时，但同一 Pod 显式 HTTP proxy 探针可通：不要只看 Pod `HTTP_PROXY` env，按“受保护手动账号代理与分组绑定”小节确认 `manualAccounts.protected[].proxyBinding`，跑 `codex-pool sync --target D601 --confirm` 后再用原账号测试复测。
+- 手动 OAuth/API-key 账号 WebUI account test 正常，但 PC Codex 客户端通过统一 key 访问 `/responses` 返回 503 且 trace 是 `account-select-failed` / `no available accounts`：按“受保护手动账号代理与分组绑定”小节确认 `manualAccounts.protected[].groupBinding.source: pool-group`，跑 `codex-pool sync --target D601 --confirm` 后用 `codex-pool validate --target D601 --full` 复测统一 key。
 - Sub2API 卡在 `wait-postgres` / `wait-redis` 或服务内大量 `context deadline exceeded`：先跑 `sub2api status` 看 `networkPolicy.ok`，再跑 `sub2api validate` 看 `postgresCrossPodPgIsReady` / `redisCrossPodPing`；缺失或异常时用 `sub2api apply --confirm` 恢复受控 `NetworkPolicy/allow-all`，不要保留手工 iptables bypass 作为长期修复。
 - pool key 401：跑 `codex-pool sync --confirm` 重建 Sub2API key 与 k3s Secret 绑定，再跑 `codex-pool validate`。
 - 运行中过去的验证探针残留：只用 `codex-pool cleanup-probes --confirm` 清理 `unidesk-probe-*` 临时资源；不要把真实 managed account 删除当作探针清理或可用性恢复。
@@ -142,6 +142,9 @@ manualAccounts:
        enabled: true
        source: target-egress-proxy
        proxyName: platform-infra-sub2api-egress-proxy
+      groupBinding:
+        enabled: true
+        source: pool-group
 publicExposure:
  enabled: false
  proxyName: platform-infra-sub2api
@@ -99,7 +99,7 @@
 - Codex account-state, quota prompts, model-routing failures, encrypted-content affinity failures, gateway wrappers, and timeout-like upstream errors must be handled by the generic temporary-unschedulable/failover path plus the external marker sentinel. Do not change membership, priority, capacity, load factor, WebSocket mode, `pool_mode`, or a specific provider's status merely to work around those errors. If a matching upstream failure still logs `openai.forward_failed` without `openai.upstream_failover_switching`, the missing fix is in Sub2API's HTTP `/responses` failover classification/error propagation, not in account pinning.
 - `profiles.entries[].openaiResponsesWebSocketsV2Mode` is the account-level Responses WebSocket v2 switch for OpenAI-compatible upstreams that require WebSocket transport. Allowed values are `off`, `ctx_pool`, and `passthrough`; omit the field unless that upstream needs it.
 - `profiles.entries[].upstreamUserAgent` is an optional account-level upstream request User-Agent override. Use it only for upstreams that require a Codex CLI compatible User-Agent; keep the value YAML-controlled and newline-free.
- `manualAccounts.protected` declares Sub2API accounts that were created or edited manually and must stay outside UniDesk-managed Codex pool credentials, scheduler policy, and sentinel control. The only allowed reconciliation for such an account is an explicitly declared narrow capability such as `proxyBinding`, which may align the account's Sub2API `proxy_id` to the YAML-selected target egress proxy. `codex-pool sync --confirm` must not rewrite protected account credentials, status, schedulability, groups, priority, capacity, load factor, or sentinel state, and `sentinel-probe --account ...` must refuse protected manual accounts.
+- `manualAccounts.protected` declares Sub2API accounts that were created or edited manually and must stay outside UniDesk-managed Codex pool credentials, scheduler policy, and sentinel control. The only allowed reconciliation for such an account is an explicitly declared narrow capability such as `proxyBinding`, which may align the account's Sub2API `proxy_id` to the YAML-selected target egress proxy, or `groupBinding`, which may attach the account to the YAML-selected pool group so the unified consumer key can use it. `codex-pool sync --confirm` must not rewrite protected account credentials, status, schedulability, priority, capacity, load factor, or sentinel state, and `sentinel-probe --account ...` must refuse protected manual accounts.
 - `publicExposure` in `config/platform-infra/sub2api-codex-pool.yaml` controls the legacy Codex-pool public bridge from master server to the G14 ClusterIP service and should stay disabled unless that bridge is explicitly reintroduced. Target-level `publicExposure` in `config/platform-infra/sub2api.yaml` controls the active public edge such as D601-to-PK01.
 - `publicExposure.masterCaddy.responseHeaderTimeoutSeconds` controls the master Caddy `response_header_timeout` for the public Sub2API site. It must be long enough for Codex `/responses/compact` requests; otherwise Caddy can return a client-visible 504 before Sub2API finishes the upstream compact request, and that edge timeout is not an account-level upstream failure that Sub2API can use for temporary-unschedulable failover. The numeric value belongs only in `config/platform-infra/sub2api-codex-pool.yaml`; after changing it, use `codex-pool expose --confirm` to reload Caddy and verify the rendered `response_header_timeout`. Requests that were already in flight before the reload may still finish with the previous timeout, so post-change evidence should check only requests that started after the reload.
 - `publicExposure.masterCaddy.edgeRetry` controls the master Caddy reverse-proxy retry window for the public Sub2API site. This belongs at the edge because FRP remotePort listener loss, `connection refused`, EOF, or connection reset can happen before a request reaches Sub2API, so Sub2API account failover and sentinel logic cannot observe or recover that request. Keep retry scope narrow, especially for non-idempotent POST traffic: connection-attempt failures may be retried by the reverse proxy, while round-trip retry after an upstream connection was established should be limited by YAML `retryMatch` to paths that are safe to repeat, such as compact. Retry durations and intervals belong only in YAML; after changing them, run `codex-pool expose --confirm` and verify the rendered Caddyfile contains the expected `lb_try_duration`, `lb_try_interval`, and `lb_retry_match`.
@@ -131,6 +131,8 @@ This management-plane test is also outside the normal consumer gateway scheduler

 The management test uses Sub2API's account-level proxy selection, not the Pod environment as a fallback. In Sub2API v0.1.136 the upstream HTTP transport is configured from the account's `ProxyID` / proxy URL; an account with no proxy binding goes direct even if the Sub2API Pod has `HTTP_PROXY` or `HTTPS_PROXY` set. For protected manual accounts that need the target egress path, declare `manualAccounts.protected[].proxyBinding` in `config/platform-infra/sub2api-codex-pool.yaml` and reconcile it with `codex-pool sync --target <active> --confirm`; do not hand-patch the runtime account or infer proxy coverage from Pod env alone.

+The management test is also not proof that the unified consumer key can select the account. A protected manual account must be attached to the pool group before ordinary `/responses` or `/v1/responses` traffic can use it. When that is intended, declare `manualAccounts.protected[].groupBinding.source: pool-group`; sync should add the account to the current `pool.groupName` without making it a YAML-managed profile or sentinel target.
+
 An external account-level sentinel that wants parity with this WebUI path should reuse the same request shape as far as the standard OpenAI SDK allows: direct account credentials, Responses API, `stream=true`, no `store: false` for API-key accounts, no upstream `max_output_tokens` field, and success parsing based on the streaming events. A local stream delta collection limit is acceptable as a sentinel safety bound, but it should not change the upstream request body. The sentinel may replace the user text `hi` with a marker prompt, but it should not introduce extra request fields or Codex/compact headers merely for convenience. If a marker-only sentinel intentionally diverges from the management test shape, the divergence must be documented in probe output so a WebUI success and sentinel failure are not misread as operator error.

 ## Account Sentinel Marker Contract
@@ -164,10 +164,16 @@ interface CodexPoolManualAccountProxyBinding {
  proxyName: string;
 }

+interface CodexPoolManualAccountGroupBinding {
+  enabled: boolean;
+  source: "pool-group";
+}
+
 interface CodexPoolManualAccountProtection {
  accountName: string;
  reason: string | null;
  proxyBinding: CodexPoolManualAccountProxyBinding | null;
+  groupBinding: CodexPoolManualAccountGroupBinding | null;
 }

 interface CodexPoolProfileConfig {
@@ -703,11 +709,11 @@ function codexPoolPlan(options?: DisclosureOptions): Record<string, unknown> {
        : runtimeTarget.publicBaseUrl === null
          ? "Public FRP exposure is disabled by YAML."
          : `Legacy Codex-pool FRP exposure is disabled by YAML; Codex consumers for target ${runtimeTarget.id} use target-level public exposure ${consumerBaseUrl}.`,
-      idempotency: "sync reuses the group, account names, and k3s Secret when they already exist; credentials are updated from the current local Codex files; managed accounts missing from YAML are preserved unless --prune-removed is explicitly provided.",
+      idempotency: "sync reuses the group, account names, and k3s Secret when they already exist; credentials are updated from the current local Codex files for YAML-managed profiles only; managed accounts missing from YAML are preserved unless --prune-removed is explicitly provided.",
      configPolicy: "UniDesk-owned durable configuration remains YAML-first; local ~/.codex files and runtime Secrets are not committed.",
      manualAccountProtection: pool.manualAccounts.protected.length === 0
        ? "No manual Sub2API accounts are protected by YAML."
-        : `${pool.manualAccounts.protected.length} manual Sub2API account(s) are protected from UniDesk-managed sync, prune, sentinel probe, and sentinel freeze paths.`,
+        : `${pool.manualAccounts.protected.length} manual Sub2API account(s) are protected from UniDesk-managed credentials, prune, sentinel probe, and sentinel freeze paths; only explicitly declared proxy/group bindings are reconciled.`,
    },
    next: ok
      ? { sync: `bun scripts/cli.ts platform-infra sub2api codex-pool sync${targetFlag(runtimeTarget)} --confirm` }
@@ -1520,7 +1526,8 @@ function readManualAccountsConfig(value: unknown, defaults: CodexPoolManualAccou
    seen.add(normalized);
    const reason = isRecord(entry) ? readManualAccountReason(entry.reason, `${key}.reason`) : null;
    const proxyBinding = isRecord(entry) ? readManualAccountProxyBinding(entry.proxyBinding, `${key}.proxyBinding`) : null;
-    return { accountName, reason, proxyBinding };
+    const groupBinding = isRecord(entry) ? readManualAccountGroupBinding(entry.groupBinding, `${key}.groupBinding`) : null;
+    return { accountName, reason, proxyBinding, groupBinding };
  });
  return { protected: protectedAccounts };
 }
@@ -1541,6 +1548,18 @@ function readManualAccountProxyBinding(value: unknown, key: string): CodexPoolMa
  };
 }

+function readManualAccountGroupBinding(value: unknown, key: string): CodexPoolManualAccountGroupBinding | null {
+  if (value === undefined || value === null) return null;
+  if (!isRecord(value)) throw new Error(`${codexPoolConfigPath}.${key} must be a YAML object`);
+  const enabled = value.enabled === undefined ? true : value.enabled === true;
+  const source = stringValue(value.source) ?? "pool-group";
+  if (source !== "pool-group") throw new Error(`${codexPoolConfigPath}.${key}.source must be pool-group`);
+  return {
+    enabled,
+    source,
+  };
+}
+
 function readManualAccountName(value: unknown, key: string): string | null {
  const text = stringValue(value)?.trim() ?? null;
  if (text === null || text.length === 0) return null;
@@ -2051,7 +2070,7 @@ function codexPoolConfigSummary(pool: CodexPoolConfig): Record<string, unknown>
    manualAccounts: {
      protectedCount: pool.manualAccounts.protected.length,
      protected: pool.manualAccounts.protected,
-      controlPolicy: "manual accounts are not created, updated, pruned, probed, or frozen by UniDesk codex-pool sync/sentinel",
+      controlPolicy: "manual accounts are not created, credential-updated, pruned, probed, or frozen by UniDesk codex-pool sync/sentinel; optional proxy_id and pool group membership bindings are narrow YAML-controlled exceptions",
    },
    publicExposure: publicExposureSummary(pool),
    localCodex: pool.localCodex,
@@ -2158,6 +2177,7 @@ function compactManualAccounts(block: unknown): Record<string, unknown> | null {
    "inYamlProfiles",
    "runtimeMarkedUnideskManaged",
    "proxyBinding",
+    "groupBinding",
    "controlPolicy",
  ]));
  const proxySync = isRecord(block.proxySync)
@@ -2181,11 +2201,31 @@ function compactManualAccounts(block: unknown): Record<string, unknown> | null {
        valuesPrinted: false,
      }
    : undefined;
+  const groupSync = isRecord(block.groupSync)
+    ? {
+        ok: block.groupSync.ok,
+        itemCount: block.groupSync.itemCount,
+        items: recordArray(block.groupSync.items).map((item) => pickSummaryFields(item, [
+          "accountName",
+          "accountId",
+          "enabled",
+          "ok",
+          "action",
+          "source",
+          "poolGroupName",
+          "poolGroupId",
+          "bindingAligned",
+          "controlPolicy",
+        ])),
+        valuesPrinted: false,
+      }
+    : undefined;
  return {
    ok: block.ok,
    protectedCount: block.protectedCount,
    items,
    proxySync,
+    groupSync,
    valuesPrinted: false,
  };
 }
@@ -4494,9 +4534,12 @@ def group_payload():
        "rpm_limit": 0,
    }

+def list_groups(token):
+    data = ensure_success(curl_api("GET", "/api/v1/admin/groups/all?platform=openai", bearer=token), "list groups")
+    return extract_items(data)
+
 def ensure_group(token):
-    existing_data = ensure_success(curl_api("GET", "/api/v1/admin/groups/all?platform=openai", bearer=token), "list groups")
-    existing = next((item for item in extract_items(existing_data) if item.get("name") == POOL_GROUP_NAME), None)
+    existing = next((item for item in list_groups(token) if item.get("name") == POOL_GROUP_NAME), None)
    payload = group_payload()
    if existing is None:
        created = ensure_success(curl_api("POST", "/api/v1/admin/groups", bearer=token, payload=payload), "create group")
@@ -4508,6 +4551,26 @@ def ensure_group(token):
    updated = ensure_success(curl_api("PUT", f"/api/v1/admin/groups/{group_id}", bearer=token, payload=payload), "update group")
    return updated if isinstance(updated, dict) else existing, "updated"

+def list_accounts_for_group(token, group_id):
+    path = f"/api/v1/admin/accounts?group_id={group_id}&page=1&page_size=500&platform=openai"
+    data = ensure_success(curl_api("GET", path, bearer=token), f"list accounts for group {group_id}")
+    return extract_items(data)
+
+def account_group_ids(token, account):
+    if not isinstance(account, dict) or account.get("id") is None:
+        return []
+    account_id = account.get("id")
+    account_name = account.get("name")
+    ids = []
+    for group in list_groups(token):
+        group_id = group.get("id") if isinstance(group, dict) else None
+        if group_id is None:
+            continue
+        members = list_accounts_for_group(token, group_id)
+        if any(item.get("id") == account_id or item.get("name") == account_name for item in members if isinstance(item, dict)):
+            ids.append(group_id)
+    return sorted(set(ids))
+
 def list_accounts(token):
    path = "/api/v1/admin/accounts?page=1&page_size=200&platform=openai&type=apikey&search=" + quote("unidesk-codex-")
    data = ensure_success(curl_api("GET", path, bearer=token), "list accounts")
@@ -4689,7 +4752,119 @@ def ensure_manual_account_proxy_bindings(token):
        "valuesPrinted": False,
    }

-def manual_account_protection_status(token):
+def manual_group_binding_enabled(protection):
+    binding = protection.get("groupBinding") if isinstance(protection, dict) else None
+    if not isinstance(binding, dict) or binding.get("enabled") is not True:
+        return False
+    if binding.get("source") != "pool-group":
+        raise RuntimeError("manual account groupBinding source must be pool-group")
+    return True
+
+def manual_group_status(token, account, protection, group_id):
+    enabled = manual_group_binding_enabled(protection)
+    if not enabled:
+        return {
+            "enabled": False,
+            "ok": True,
+            "action": "not-configured",
+            "valuesPrinted": False,
+        }
+    group_accounts = list_accounts_for_group(token, group_id)
+    account_id = account.get("id") if isinstance(account, dict) else None
+    account_name = account.get("name") if isinstance(account, dict) else None
+    binding_aligned = any(
+        item.get("id") == account_id or item.get("name") == account_name
+        for item in group_accounts
+        if isinstance(item, dict)
+    )
+    return {
+        "enabled": True,
+        "ok": binding_aligned,
+        "action": "validate",
+        "source": "pool-group",
+        "poolGroupName": POOL_GROUP_NAME,
+        "poolGroupId": group_id,
+        "bindingAligned": binding_aligned,
+        "valuesPrinted": False,
+    }
+
+def ensure_manual_account_group_bindings(token, group_id):
+    items = []
+    for protection in MANUAL_ACCOUNT_PROTECTIONS:
+        if not isinstance(protection, dict):
+            continue
+        name = protection.get("accountName")
+        if not isinstance(name, str) or not name:
+            continue
+        if not manual_group_binding_enabled(protection):
+            items.append({
+                "accountName": name,
+                "enabled": False,
+                "action": "not-configured",
+                "ok": True,
+                "valuesPrinted": False,
+            })
+            continue
+        account = find_account_by_name(token, name)
+        if not isinstance(account, dict):
+            items.append({
+                "accountName": name,
+                "enabled": True,
+                "action": "account-missing",
+                "ok": False,
+                "poolGroupName": POOL_GROUP_NAME,
+                "poolGroupId": group_id,
+                "valuesPrinted": False,
+            })
+            continue
+        extra = account.get("extra") if isinstance(account.get("extra"), dict) else {}
+        if extra.get("unidesk_managed") is True:
+            items.append({
+                "accountName": name,
+                "accountId": account.get("id"),
+                "enabled": True,
+                "action": "refused-managed-runtime-account",
+                "ok": False,
+                "poolGroupName": POOL_GROUP_NAME,
+                "poolGroupId": group_id,
+                "valuesPrinted": False,
+            })
+            continue
+        existing_group_ids = account_group_ids(token, account)
+        desired_group_ids = sorted(set(existing_group_ids + [group_id]))
+        action = "unchanged"
+        if group_id not in existing_group_ids:
+            updated = ensure_success(curl_api("PUT", f"/api/v1/admin/accounts/{account['id']}", bearer=token, payload={"group_ids": desired_group_ids}), f"bind manual account group {name}")
+            account = updated if isinstance(updated, dict) else account
+            action = "bound"
+        binding_aligned = any(
+            item.get("id") == account.get("id") or item.get("name") == name
+            for item in list_accounts_for_group(token, group_id)
+            if isinstance(item, dict)
+        )
+        items.append({
+            "accountName": name,
+            "accountId": account.get("id"),
+            "enabled": True,
+            "ok": binding_aligned,
+            "action": action,
+            "source": "pool-group",
+            "poolGroupName": POOL_GROUP_NAME,
+            "poolGroupId": group_id,
+            "previousGroupIds": existing_group_ids,
+            "desiredGroupIds": desired_group_ids,
+            "bindingAligned": binding_aligned,
+            "controlPolicy": "manual-protected: only pool group membership is YAML-controlled; credentials/status/schedulable are untouched and sentinel does not probe it",
+            "valuesPrinted": False,
+        })
+    return {
+        "ok": all(item.get("ok") is True for item in items),
+        "itemCount": len(items),
+        "items": items,
+        "valuesPrinted": False,
+    }
+
+def manual_account_protection_status(token, group_id=None):
    items = []
    desired_names = set(EXPECTED_ACCOUNT_CAPACITIES.keys())
    for protection in MANUAL_ACCOUNT_PROTECTIONS:
@@ -4701,6 +4876,7 @@ def manual_account_protection_status(token):
        account = find_account_by_name(token, name)
        extra = account.get("extra") if isinstance(account, dict) and isinstance(account.get("extra"), dict) else {}
        proxy_status = manual_proxy_status(token, account, protection)
+        group_status = manual_group_status(token, account, protection, group_id) if group_id is not None else {"enabled": False, "ok": True, "action": "not-checked", "valuesPrinted": False}
        items.append({
            "accountName": name,
            "reason": protection.get("reason") if isinstance(protection.get("reason"), str) else None,
@@ -4711,8 +4887,9 @@ def manual_account_protection_status(token):
            "inYamlProfiles": name in desired_names,
            "runtimeMarkedUnideskManaged": extra.get("unidesk_managed") is True,
            "proxyBinding": proxy_status,
-            "ok": proxy_status.get("ok") is True,
-            "controlPolicy": "manual-protected: no create/update/prune/probe/freeze; optional proxy_id binding only when proxyBinding is configured",
+            "groupBinding": group_status,
+            "ok": proxy_status.get("ok") is True and group_status.get("ok") is True,
+            "controlPolicy": "manual-protected: no create/update/prune/probe/freeze; optional proxy_id and pool group membership binding only when configured",
            "valuesPrinted": False,
        })
    return {
@@ -6334,7 +6511,8 @@ def run_sync():
    protected_frozen_names = active_sentinel_quarantine_names()
    account_results, pruned_account_results = ensure_accounts(token, profiles, group_id, prune_removed, protected_frozen_names, existing_accounts)
    manual_account_proxy_bindings = ensure_manual_account_proxy_bindings(token)
-    manual_account_protections = manual_account_protection_status(token)
+    manual_account_group_bindings = ensure_manual_account_group_bindings(token, group_id)
+    manual_account_protections = manual_account_protection_status(token, group_id)
    capacity_status = account_capacity_status(token)
    load_factor_status = account_load_factor_status(token)
    ws_v2_status = account_ws_v2_status(token)
@@ -6352,7 +6530,7 @@ def run_sync():
    sentinel_quality = ensure_sentinel_state_for_sync(account_results)
    sentinel_reassert = reassert_sentinel_freezes_after_sync(token)
    return {
-        "ok": gateway["ok"] is True and responses_smoke["ok"] is True and owner_concurrency["ok"] is True and capacity_status["ok"] is True and load_factor_status["ok"] is True and ws_v2_status["ok"] is True and temp_unschedulable_status["ok"] is True and manual_account_proxy_bindings.get("ok") is True and manual_account_protections.get("ok") is True and sentinel.get("ok") is True and sentinel_quality_prepare.get("ok") is True and sentinel_quality.get("ok") is True and sentinel_reassert.get("ok") is True and runtime_capabilities.get("ok") is True,
+        "ok": gateway["ok"] is True and responses_smoke["ok"] is True and owner_concurrency["ok"] is True and capacity_status["ok"] is True and load_factor_status["ok"] is True and ws_v2_status["ok"] is True and temp_unschedulable_status["ok"] is True and manual_account_proxy_bindings.get("ok") is True and manual_account_group_bindings.get("ok") is True and manual_account_protections.get("ok") is True and sentinel.get("ok") is True and sentinel_quality_prepare.get("ok") is True and sentinel_quality.get("ok") is True and sentinel_reassert.get("ok") is True and runtime_capabilities.get("ok") is True,
        "degraded": bool(responses_smoke.get("degraded")) or bool(compact_evidence.get("degraded")) or bool(responses_evidence.get("degraded")) or runtime_capabilities.get("ok") is not True,
        "mode": "sync",
        "namespace": NAMESPACE,
@@ -6371,7 +6549,7 @@ def run_sync():
            "processControl": {"schedulableRestore": "sentinel marker probe only; sync does not restore schedulable for existing accounts", "durableConfig": False},
            "valuesPrinted": False,
        },
-        "manualAccounts": {**manual_account_protections, "proxySync": manual_account_proxy_bindings},
+        "manualAccounts": {**manual_account_protections, "proxySync": manual_account_proxy_bindings, "groupSync": manual_account_group_bindings},
        "capacity": capacity_status,
        "loadFactor": load_factor_status,
        "webSocketsV2": ws_v2_status,
@@ -6410,7 +6588,8 @@ def run_validate():
    load_factor_status = account_load_factor_status(token)
    ws_v2_status = account_ws_v2_status(token)
    temp_unschedulable_status = account_temp_unschedulable_status(token)
-    manual_account_protections = manual_account_protection_status(token)
+    pool_group_id = key_item.get("group_id") if isinstance(key_item, dict) else None
+    manual_account_protections = manual_account_protection_status(token, pool_group_id)
    gateway = validate_gateway(api_key)
    responses_smoke = validate_gateway_responses(api_key)
    compact_evidence = recent_compact_gateway_evidence()
@@ -6429,6 +6608,7 @@ def run_validate():
            "secret": f"{NAMESPACE}/{POOL_API_KEY_SECRET_NAME}.{POOL_API_KEY_SECRET_KEY}",
            "sub2apiId": key_item.get("id") if isinstance(key_item, dict) else None,
            "userId": key_item.get("user_id") if isinstance(key_item, dict) else None,
+            "groupId": key_item.get("group_id") if isinstance(key_item, dict) else None,
            "keyPreview": api_key_preview(api_key),
            "valuesPrinted": False,
        },