fix: bind protected sub2api manual accounts to pool group
This commit is contained in:
@@ -112,7 +112,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --target D60
|
||||
- `profiles.entries[].tempUnschedulable`: 可选 per-account Sub2API 内置临时不可调度覆盖;只用于明确偏离 pool 默认规则,不用它给某个账号特殊优先级或临时绕过通用 failover。
|
||||
- `profiles.entries[].openaiResponsesWebSocketsV2Mode`: 需要 Responses WebSocket v2 的上游才设置,值为 `off`、`ctx_pool` 或 `passthrough`。
|
||||
- `profiles.entries[].upstreamUserAgent`: 少数要求 Codex CLI User-Agent 的上游才设置,不能含换行。
|
||||
- `manualAccounts.protected`: 已在 Sub2API 手动创建/维护、且必须排除在 UniDesk-managed Codex pool 和 sentinel 控制之外的账号。默认不得改 credentials/status/schedulable/groups/priority/capacity/loadFactor;只有显式声明 `proxyBinding` 时,`sync --confirm` 才允许把该账号的 `proxy_id` 对齐到 YAML 目标的 egress proxy。
|
||||
- `manualAccounts.protected`: 已在 Sub2API 手动创建/维护、且必须排除在 UniDesk-managed Codex pool credentials 和 sentinel 控制之外的账号。默认不得改 credentials/status/schedulable/priority/capacity/loadFactor;只有显式声明 `proxyBinding` 时,`sync --confirm` 才允许把该账号的 `proxy_id` 对齐到 YAML 目标的 egress proxy;只有显式声明 `groupBinding.source: pool-group` 时,才允许把该账号加入统一消费 API key 使用的 pool group。
|
||||
- `sentinel.monitor.enabled`: 账号级 marker 哨兵监控开关;开启后 `codex-pool sync --confirm` 会在 `platform-infra` 创建/更新 k8s CronJob、ConfigMap、Secret、ServiceAccount、Role 和 RoleBinding。CronJob 直打 YAML-managed 上游账号的 OpenAI Responses `gpt-5.5`,用确定 marker 作为唯一健康标准,并在独立 state ConfigMap 中记录 token/cost 账本。
|
||||
- `sentinel.actions.enabled`: 账号级哨兵冻结/恢复动作开关;当前 marker-only guard 要求开启。动作关闭时只记录 `would-freeze`,不会调用 Sub2API admin API 改 `schedulable`。动作开启后,只要不满足 marker match,不论是 HTTP 200 私货、4xx/5xx、非 JSON、连接错误还是空输出,都进入同一个冻结/恢复状态机。
|
||||
- `sentinel.sdk.openaiPythonVersion`: 哨兵容器使用的 OpenAI Python SDK 固定版本;模型请求必须通过标准 SDK `responses.create`,不要手工拼 `/v1/responses` 请求体或手写响应解析。后续升级 SDK 只改 YAML 并 `sync --confirm`。
|
||||
@@ -122,7 +122,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --target D60
|
||||
- `sentinel.freeze`: 失败冻结 TTL 指数退避配置。当前口径是初始 1 分钟,失败后 `1m -> 2m -> 4m -> 8m -> 10m`,最大 10 分钟;失败 probe 基本不消耗有效输出 token,因此冻结窗口保持短周期。冻结到期后只做恢复 probe,通过才自动恢复,不能仅靠 TTL 到期解封。
|
||||
- `sentinel.pricing`: 直打上游时哨兵自己的 token/cost 估算价格。因为 direct upstream probe 不经过 Sub2API 普通用量账本,哨兵必须自己记录全局与 per-account token/cost;这些账本只用于观察,不作为跳过探测的预算门禁。
|
||||
|
||||
`sync --confirm` 会登录 Sub2API admin、创建/更新 group、创建/更新 YAML 中的 `unidesk-codex-*` accounts、创建/复用统一 API key Secret,并部署/更新哨兵资源;它不把既有 managed account 直接恢复为 `schedulable=true`。恢复只由哨兵在读取 Sub2API runtime `schedulable=false` 后触发 recovery probe,并在 marker 命中时执行。`sync` 默认不删除 YAML 中缺席的 managed account。只有明确退役上游时才使用 `sync --confirm --prune-removed` 删除缺席且 `extra.unidesk_managed=true` 的 `unidesk-codex-*` account。对 `manualAccounts.protected`,`sync` 只执行 YAML 显式允许的窄同步;当前允许项是从目标 `egressProxy` 创建/更新 Sub2API internal proxy 记录,并把受保护手动账号的 `proxy_id` 绑定过去,不接管该账号凭据、调度、分组或哨兵状态。
|
||||
`sync --confirm` 会登录 Sub2API admin、创建/更新 group、创建/更新 YAML 中的 `unidesk-codex-*` accounts、创建/复用统一 API key Secret,并部署/更新哨兵资源;它不把既有 managed account 直接恢复为 `schedulable=true`。恢复只由哨兵在读取 Sub2API runtime `schedulable=false` 后触发 recovery probe,并在 marker 命中时执行。`sync` 默认不删除 YAML 中缺席的 managed account。只有明确退役上游时才使用 `sync --confirm --prune-removed` 删除缺席且 `extra.unidesk_managed=true` 的 `unidesk-codex-*` account。对 `manualAccounts.protected`,`sync` 只执行 YAML 显式允许的窄同步;当前允许项是从目标 `egressProxy` 创建/更新 Sub2API internal proxy 记录并绑定 `proxy_id`,以及把受保护手动账号加入当前 `pool.groupName`。它仍不接管该账号凭据、status、schedulable、priority/capacity/loadFactor 或哨兵状态。
|
||||
|
||||
`sentinel-image status|build` 管理哨兵 Python 运行环境镜像。镜像由 YAML 的 `sentinel.image` 基础镜像和 `sentinel.sdk.openaiPythonVersion` 派生,发布到目标 runtime 的本地 registry;`build --confirm` 会先检查 registry tag,存在则快速复用,不存在才在目标 host 构建并 push。CronJob 启动时只校验 SDK 版本,不在运行时 `pip install`。
|
||||
|
||||
@@ -140,11 +140,13 @@ WebSocket v2 是账号能力集合,不是调度 pin。`openaiResponsesWebSocke
|
||||
|
||||
Codex 启动时反复出现 WebSocket reconnect、HTTPS fallback、`websocket closed by server before response.completed`,或 Sub2API 日志出现 `openai.websocket_proxy_failed` / `openai.websocket_account_select_failed` / 上游 WS handshake 4xx/5xx 时,先按运行证据定位具体 account 和 transport。若账号的 WSv2 握手失败,优先只在 YAML 中把该账号的 `openaiResponsesWebSocketsV2Mode` 收敛为 `off`;若没有任何 direct Codex WSv2 probe 通过,则同时把 `localCodex.supportsWebSockets` 与 `localCodex.responsesWebSocketsV2` 收敛为 `false`,再 `codex-pool sync --confirm`。不要顺手改 membership、priority、capacity、Secret 或代码 fallback。
|
||||
|
||||
## 受保护手动账号代理绑定
|
||||
## 受保护手动账号代理与分组绑定
|
||||
|
||||
Sub2API 管理 UI 的账号连接测试使用账号级 `ProxyID` / proxy URL 配置上游 HTTP transport;账号没有绑定 proxy 时会直接出站,即使 Sub2API Pod 已经有 `HTTP_PROXY` / `HTTPS_PROXY` 环境变量。看到 WebUI 账号测试连 `chatgpt.com` 超时、但 Pod 内显式走目标 proxy 可通时,先检查该账号是否属于 `manualAccounts.protected` 并声明了 `proxyBinding`。
|
||||
|
||||
受保护手动账号仍由人工在 Sub2API UI 维护 credentials/status 等字段;UniDesk 只允许通过 YAML 做代理窄绑定:
|
||||
WebUI 账号连接测试也不经过统一消费 API key 的 pool group 选择器;账号测试正常不代表 PC Codex 客户端能选中该账号。看到 WebUI 账号测试正常、但 `/responses` 或 `/v1/responses` 以 `account-select-failed` / `no available accounts` 返回 503 时,先检查该手动账号是否声明了 `groupBinding.source: pool-group`,并通过 `sync --confirm` 加入当前 `pool.groupName`。
|
||||
|
||||
受保护手动账号仍由人工在 Sub2API UI 维护 credentials/status 等字段;UniDesk 只允许通过 YAML 做代理和分组窄绑定:
|
||||
|
||||
```bash
|
||||
bun scripts/cli.ts platform-infra sub2api codex-pool plan --target D601
|
||||
@@ -152,7 +154,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool sync --target D601 --confir
|
||||
bun scripts/cli.ts platform-infra sub2api codex-pool validate --target D601
|
||||
```
|
||||
|
||||
`sync` 输出应显示 `manualAccounts.ok=true`、`proxySync.ok=true` 和该账号 `bindingAligned=true`。`sentinel-probe --account <manual-account> --confirm` 对受保护手动账号必须继续拒绝,通常返回 `account-protected-manual`;不要为了测试而把该账号移入 `profiles.entries` 或取消保护。需要证明 WebUI 同款账号测试恢复时,用 Sub2API admin account test 原入口测最小 `hi` / `gpt-5.5`,并只记录 account id、proxy id、event types、HTTP status 和短 output preview,不记录 OAuth token 或 Secret 明文。
|
||||
`sync` 输出应显示 `manualAccounts.ok=true`、`proxySync.ok=true`、`groupSync.ok=true`,且该账号的 proxy/group `bindingAligned=true`。`sentinel-probe --account <manual-account> --confirm` 对受保护手动账号必须继续拒绝,通常返回 `account-protected-manual`;不要为了测试而把该账号移入 `profiles.entries` 或取消保护。需要证明 WebUI 同款账号测试恢复时,用 Sub2API admin account test 原入口测最小 `hi` / `gpt-5.5`,并只记录 account id、proxy id、event types、HTTP status 和短 output preview,不记录 OAuth token 或 Secret 明文。
|
||||
|
||||
## 添加上游
|
||||
|
||||
@@ -231,7 +233,8 @@ bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm
|
||||
- 只加强监控、不让哨兵自动冻结账号时,把 YAML `sentinel.actions.enabled=false` 后 `codex-pool sync --confirm`。此时 marker probe 和 gateway failure monitor 仍记录 `would-freeze` / observe-only 证据,但不会通过 Sub2API admin 写 `schedulable=false`;`/responses/compact` 的 `codex.remote_compact.failed` 和 compact 上游 5xx failover 只作为 `gateway-compact-*` 观察事件记录,不作为哨兵自动切换触发器。
|
||||
- 单个 request id 报 502/503/中断/没有自动切号:第一步跑 `bun scripts/cli.ts platform-infra sub2api codex-pool trace --request-id <requestId>`。先看 `outcome`、`reason`、`FAILOVER`、`SELECT-FAILED`、`ACCOUNT SIGNALS` 和 `WINDOW STATS`;只有 trace 报表缺字段或需要审计原始日志时,才加 `--show-lines` 或 `--raw`。若 `reason=failover-attempted-no-candidate`,说明切号动作已发生,但 scheduler 在排除失败账号后没有可用候选;继续用 `sentinel-report` 和 `validate --full` 区分 sentinel quarantine、request-path temp-unschedulable、账号 status 或容量耗尽。
|
||||
- profile invalid:先修 `~/.codex/config.toml.<profile>` 的 `base_url`、`wire_api`、`model` 或 `auth.json.<profile>` 的 API key;不要在 YAML 中写密钥。
|
||||
- 手动 OAuth/API-key 账号的 WebUI account test 连 `chatgpt.com` 超时,但同一 Pod 显式 HTTP proxy 探针可通:不要只看 Pod `HTTP_PROXY` env,按“受保护手动账号代理绑定”小节确认 `manualAccounts.protected[].proxyBinding`,跑 `codex-pool sync --target D601 --confirm` 后再用原账号测试复测。
|
||||
- 手动 OAuth/API-key 账号的 WebUI account test 连 `chatgpt.com` 超时,但同一 Pod 显式 HTTP proxy 探针可通:不要只看 Pod `HTTP_PROXY` env,按“受保护手动账号代理与分组绑定”小节确认 `manualAccounts.protected[].proxyBinding`,跑 `codex-pool sync --target D601 --confirm` 后再用原账号测试复测。
|
||||
- 手动 OAuth/API-key 账号 WebUI account test 正常,但 PC Codex 客户端通过统一 key 访问 `/responses` 返回 503 且 trace 是 `account-select-failed` / `no available accounts`:按“受保护手动账号代理与分组绑定”小节确认 `manualAccounts.protected[].groupBinding.source: pool-group`,跑 `codex-pool sync --target D601 --confirm` 后用 `codex-pool validate --target D601 --full` 复测统一 key。
|
||||
- Sub2API 卡在 `wait-postgres` / `wait-redis` 或服务内大量 `context deadline exceeded`:先跑 `sub2api status` 看 `networkPolicy.ok`,再跑 `sub2api validate` 看 `postgresCrossPodPgIsReady` / `redisCrossPodPing`;缺失或异常时用 `sub2api apply --confirm` 恢复受控 `NetworkPolicy/allow-all`,不要保留手工 iptables bypass 作为长期修复。
|
||||
- pool key 401:跑 `codex-pool sync --confirm` 重建 Sub2API key 与 k3s Secret 绑定,再跑 `codex-pool validate`。
|
||||
- 运行中过去的验证探针残留:只用 `codex-pool cleanup-probes --confirm` 清理 `unidesk-probe-*` 临时资源;不要把真实 managed account 删除当作探针清理或可用性恢复。
|
||||
|
||||
@@ -142,6 +142,9 @@ manualAccounts:
|
||||
enabled: true
|
||||
source: target-egress-proxy
|
||||
proxyName: platform-infra-sub2api-egress-proxy
|
||||
groupBinding:
|
||||
enabled: true
|
||||
source: pool-group
|
||||
publicExposure:
|
||||
enabled: false
|
||||
proxyName: platform-infra-sub2api
|
||||
|
||||
@@ -99,7 +99,7 @@
|
||||
- Codex account-state, quota prompts, model-routing failures, encrypted-content affinity failures, gateway wrappers, and timeout-like upstream errors must be handled by the generic temporary-unschedulable/failover path plus the external marker sentinel. Do not change membership, priority, capacity, load factor, WebSocket mode, `pool_mode`, or a specific provider's status merely to work around those errors. If a matching upstream failure still logs `openai.forward_failed` without `openai.upstream_failover_switching`, the missing fix is in Sub2API's HTTP `/responses` failover classification/error propagation, not in account pinning.
|
||||
- `profiles.entries[].openaiResponsesWebSocketsV2Mode` is the account-level Responses WebSocket v2 switch for OpenAI-compatible upstreams that require WebSocket transport. Allowed values are `off`, `ctx_pool`, and `passthrough`; omit the field unless that upstream needs it.
|
||||
- `profiles.entries[].upstreamUserAgent` is an optional account-level upstream request User-Agent override. Use it only for upstreams that require a Codex CLI compatible User-Agent; keep the value YAML-controlled and newline-free.
|
||||
- `manualAccounts.protected` declares Sub2API accounts that were created or edited manually and must stay outside UniDesk-managed Codex pool credentials, scheduler policy, and sentinel control. The only allowed reconciliation for such an account is an explicitly declared narrow capability such as `proxyBinding`, which may align the account's Sub2API `proxy_id` to the YAML-selected target egress proxy. `codex-pool sync --confirm` must not rewrite protected account credentials, status, schedulability, groups, priority, capacity, load factor, or sentinel state, and `sentinel-probe --account ...` must refuse protected manual accounts.
|
||||
- `manualAccounts.protected` declares Sub2API accounts that were created or edited manually and must stay outside UniDesk-managed Codex pool credentials, scheduler policy, and sentinel control. The only allowed reconciliation for such an account is an explicitly declared narrow capability such as `proxyBinding`, which may align the account's Sub2API `proxy_id` to the YAML-selected target egress proxy, or `groupBinding`, which may attach the account to the YAML-selected pool group so the unified consumer key can use it. `codex-pool sync --confirm` must not rewrite protected account credentials, status, schedulability, priority, capacity, load factor, or sentinel state, and `sentinel-probe --account ...` must refuse protected manual accounts.
|
||||
- `publicExposure` in `config/platform-infra/sub2api-codex-pool.yaml` controls the legacy Codex-pool public bridge from master server to the G14 ClusterIP service and should stay disabled unless that bridge is explicitly reintroduced. Target-level `publicExposure` in `config/platform-infra/sub2api.yaml` controls the active public edge such as D601-to-PK01.
|
||||
- `publicExposure.masterCaddy.responseHeaderTimeoutSeconds` controls the master Caddy `response_header_timeout` for the public Sub2API site. It must be long enough for Codex `/responses/compact` requests; otherwise Caddy can return a client-visible 504 before Sub2API finishes the upstream compact request, and that edge timeout is not an account-level upstream failure that Sub2API can use for temporary-unschedulable failover. The numeric value belongs only in `config/platform-infra/sub2api-codex-pool.yaml`; after changing it, use `codex-pool expose --confirm` to reload Caddy and verify the rendered `response_header_timeout`. Requests that were already in flight before the reload may still finish with the previous timeout, so post-change evidence should check only requests that started after the reload.
|
||||
- `publicExposure.masterCaddy.edgeRetry` controls the master Caddy reverse-proxy retry window for the public Sub2API site. This belongs at the edge because FRP remotePort listener loss, `connection refused`, EOF, or connection reset can happen before a request reaches Sub2API, so Sub2API account failover and sentinel logic cannot observe or recover that request. Keep retry scope narrow, especially for non-idempotent POST traffic: connection-attempt failures may be retried by the reverse proxy, while round-trip retry after an upstream connection was established should be limited by YAML `retryMatch` to paths that are safe to repeat, such as compact. Retry durations and intervals belong only in YAML; after changing them, run `codex-pool expose --confirm` and verify the rendered Caddyfile contains the expected `lb_try_duration`, `lb_try_interval`, and `lb_retry_match`.
|
||||
@@ -131,6 +131,8 @@ This management-plane test is also outside the normal consumer gateway scheduler
|
||||
|
||||
The management test uses Sub2API's account-level proxy selection, not the Pod environment as a fallback. In Sub2API v0.1.136 the upstream HTTP transport is configured from the account's `ProxyID` / proxy URL; an account with no proxy binding goes direct even if the Sub2API Pod has `HTTP_PROXY` or `HTTPS_PROXY` set. For protected manual accounts that need the target egress path, declare `manualAccounts.protected[].proxyBinding` in `config/platform-infra/sub2api-codex-pool.yaml` and reconcile it with `codex-pool sync --target <active> --confirm`; do not hand-patch the runtime account or infer proxy coverage from Pod env alone.
|
||||
|
||||
The management test is also not proof that the unified consumer key can select the account. A protected manual account must be attached to the pool group before ordinary `/responses` or `/v1/responses` traffic can use it. When that is intended, declare `manualAccounts.protected[].groupBinding.source: pool-group`; sync should add the account to the current `pool.groupName` without making it a YAML-managed profile or sentinel target.
|
||||
|
||||
An external account-level sentinel that wants parity with this WebUI path should reuse the same request shape as far as the standard OpenAI SDK allows: direct account credentials, Responses API, `stream=true`, no `store: false` for API-key accounts, no upstream `max_output_tokens` field, and success parsing based on the streaming events. A local stream delta collection limit is acceptable as a sentinel safety bound, but it should not change the upstream request body. The sentinel may replace the user text `hi` with a marker prompt, but it should not introduce extra request fields or Codex/compact headers merely for convenience. If a marker-only sentinel intentionally diverges from the management test shape, the divergence must be documented in probe output so a WebUI success and sentinel failure are not misread as operator error.
|
||||
|
||||
## Account Sentinel Marker Contract
|
||||
|
||||
@@ -164,10 +164,16 @@ interface CodexPoolManualAccountProxyBinding {
|
||||
proxyName: string;
|
||||
}
|
||||
|
||||
interface CodexPoolManualAccountGroupBinding {
|
||||
enabled: boolean;
|
||||
source: "pool-group";
|
||||
}
|
||||
|
||||
interface CodexPoolManualAccountProtection {
|
||||
accountName: string;
|
||||
reason: string | null;
|
||||
proxyBinding: CodexPoolManualAccountProxyBinding | null;
|
||||
groupBinding: CodexPoolManualAccountGroupBinding | null;
|
||||
}
|
||||
|
||||
interface CodexPoolProfileConfig {
|
||||
@@ -703,11 +709,11 @@ function codexPoolPlan(options?: DisclosureOptions): Record<string, unknown> {
|
||||
: runtimeTarget.publicBaseUrl === null
|
||||
? "Public FRP exposure is disabled by YAML."
|
||||
: `Legacy Codex-pool FRP exposure is disabled by YAML; Codex consumers for target ${runtimeTarget.id} use target-level public exposure ${consumerBaseUrl}.`,
|
||||
idempotency: "sync reuses the group, account names, and k3s Secret when they already exist; credentials are updated from the current local Codex files; managed accounts missing from YAML are preserved unless --prune-removed is explicitly provided.",
|
||||
idempotency: "sync reuses the group, account names, and k3s Secret when they already exist; credentials are updated from the current local Codex files for YAML-managed profiles only; managed accounts missing from YAML are preserved unless --prune-removed is explicitly provided.",
|
||||
configPolicy: "UniDesk-owned durable configuration remains YAML-first; local ~/.codex files and runtime Secrets are not committed.",
|
||||
manualAccountProtection: pool.manualAccounts.protected.length === 0
|
||||
? "No manual Sub2API accounts are protected by YAML."
|
||||
: `${pool.manualAccounts.protected.length} manual Sub2API account(s) are protected from UniDesk-managed sync, prune, sentinel probe, and sentinel freeze paths.`,
|
||||
: `${pool.manualAccounts.protected.length} manual Sub2API account(s) are protected from UniDesk-managed credentials, prune, sentinel probe, and sentinel freeze paths; only explicitly declared proxy/group bindings are reconciled.`,
|
||||
},
|
||||
next: ok
|
||||
? { sync: `bun scripts/cli.ts platform-infra sub2api codex-pool sync${targetFlag(runtimeTarget)} --confirm` }
|
||||
@@ -1520,7 +1526,8 @@ function readManualAccountsConfig(value: unknown, defaults: CodexPoolManualAccou
|
||||
seen.add(normalized);
|
||||
const reason = isRecord(entry) ? readManualAccountReason(entry.reason, `${key}.reason`) : null;
|
||||
const proxyBinding = isRecord(entry) ? readManualAccountProxyBinding(entry.proxyBinding, `${key}.proxyBinding`) : null;
|
||||
return { accountName, reason, proxyBinding };
|
||||
const groupBinding = isRecord(entry) ? readManualAccountGroupBinding(entry.groupBinding, `${key}.groupBinding`) : null;
|
||||
return { accountName, reason, proxyBinding, groupBinding };
|
||||
});
|
||||
return { protected: protectedAccounts };
|
||||
}
|
||||
@@ -1541,6 +1548,18 @@ function readManualAccountProxyBinding(value: unknown, key: string): CodexPoolMa
|
||||
};
|
||||
}
|
||||
|
||||
function readManualAccountGroupBinding(value: unknown, key: string): CodexPoolManualAccountGroupBinding | null {
|
||||
if (value === undefined || value === null) return null;
|
||||
if (!isRecord(value)) throw new Error(`${codexPoolConfigPath}.${key} must be a YAML object`);
|
||||
const enabled = value.enabled === undefined ? true : value.enabled === true;
|
||||
const source = stringValue(value.source) ?? "pool-group";
|
||||
if (source !== "pool-group") throw new Error(`${codexPoolConfigPath}.${key}.source must be pool-group`);
|
||||
return {
|
||||
enabled,
|
||||
source,
|
||||
};
|
||||
}
|
||||
|
||||
function readManualAccountName(value: unknown, key: string): string | null {
|
||||
const text = stringValue(value)?.trim() ?? null;
|
||||
if (text === null || text.length === 0) return null;
|
||||
@@ -2051,7 +2070,7 @@ function codexPoolConfigSummary(pool: CodexPoolConfig): Record<string, unknown>
|
||||
manualAccounts: {
|
||||
protectedCount: pool.manualAccounts.protected.length,
|
||||
protected: pool.manualAccounts.protected,
|
||||
controlPolicy: "manual accounts are not created, updated, pruned, probed, or frozen by UniDesk codex-pool sync/sentinel",
|
||||
controlPolicy: "manual accounts are not created, credential-updated, pruned, probed, or frozen by UniDesk codex-pool sync/sentinel; optional proxy_id and pool group membership bindings are narrow YAML-controlled exceptions",
|
||||
},
|
||||
publicExposure: publicExposureSummary(pool),
|
||||
localCodex: pool.localCodex,
|
||||
@@ -2158,6 +2177,7 @@ function compactManualAccounts(block: unknown): Record<string, unknown> | null {
|
||||
"inYamlProfiles",
|
||||
"runtimeMarkedUnideskManaged",
|
||||
"proxyBinding",
|
||||
"groupBinding",
|
||||
"controlPolicy",
|
||||
]));
|
||||
const proxySync = isRecord(block.proxySync)
|
||||
@@ -2181,11 +2201,31 @@ function compactManualAccounts(block: unknown): Record<string, unknown> | null {
|
||||
valuesPrinted: false,
|
||||
}
|
||||
: undefined;
|
||||
const groupSync = isRecord(block.groupSync)
|
||||
? {
|
||||
ok: block.groupSync.ok,
|
||||
itemCount: block.groupSync.itemCount,
|
||||
items: recordArray(block.groupSync.items).map((item) => pickSummaryFields(item, [
|
||||
"accountName",
|
||||
"accountId",
|
||||
"enabled",
|
||||
"ok",
|
||||
"action",
|
||||
"source",
|
||||
"poolGroupName",
|
||||
"poolGroupId",
|
||||
"bindingAligned",
|
||||
"controlPolicy",
|
||||
])),
|
||||
valuesPrinted: false,
|
||||
}
|
||||
: undefined;
|
||||
return {
|
||||
ok: block.ok,
|
||||
protectedCount: block.protectedCount,
|
||||
items,
|
||||
proxySync,
|
||||
groupSync,
|
||||
valuesPrinted: false,
|
||||
};
|
||||
}
|
||||
@@ -4494,9 +4534,12 @@ def group_payload():
|
||||
"rpm_limit": 0,
|
||||
}
|
||||
|
||||
def list_groups(token):
|
||||
data = ensure_success(curl_api("GET", "/api/v1/admin/groups/all?platform=openai", bearer=token), "list groups")
|
||||
return extract_items(data)
|
||||
|
||||
def ensure_group(token):
|
||||
existing_data = ensure_success(curl_api("GET", "/api/v1/admin/groups/all?platform=openai", bearer=token), "list groups")
|
||||
existing = next((item for item in extract_items(existing_data) if item.get("name") == POOL_GROUP_NAME), None)
|
||||
existing = next((item for item in list_groups(token) if item.get("name") == POOL_GROUP_NAME), None)
|
||||
payload = group_payload()
|
||||
if existing is None:
|
||||
created = ensure_success(curl_api("POST", "/api/v1/admin/groups", bearer=token, payload=payload), "create group")
|
||||
@@ -4508,6 +4551,26 @@ def ensure_group(token):
|
||||
updated = ensure_success(curl_api("PUT", f"/api/v1/admin/groups/{group_id}", bearer=token, payload=payload), "update group")
|
||||
return updated if isinstance(updated, dict) else existing, "updated"
|
||||
|
||||
def list_accounts_for_group(token, group_id):
|
||||
path = f"/api/v1/admin/accounts?group_id={group_id}&page=1&page_size=500&platform=openai"
|
||||
data = ensure_success(curl_api("GET", path, bearer=token), f"list accounts for group {group_id}")
|
||||
return extract_items(data)
|
||||
|
||||
def account_group_ids(token, account):
|
||||
if not isinstance(account, dict) or account.get("id") is None:
|
||||
return []
|
||||
account_id = account.get("id")
|
||||
account_name = account.get("name")
|
||||
ids = []
|
||||
for group in list_groups(token):
|
||||
group_id = group.get("id") if isinstance(group, dict) else None
|
||||
if group_id is None:
|
||||
continue
|
||||
members = list_accounts_for_group(token, group_id)
|
||||
if any(item.get("id") == account_id or item.get("name") == account_name for item in members if isinstance(item, dict)):
|
||||
ids.append(group_id)
|
||||
return sorted(set(ids))
|
||||
|
||||
def list_accounts(token):
|
||||
path = "/api/v1/admin/accounts?page=1&page_size=200&platform=openai&type=apikey&search=" + quote("unidesk-codex-")
|
||||
data = ensure_success(curl_api("GET", path, bearer=token), "list accounts")
|
||||
@@ -4689,7 +4752,119 @@ def ensure_manual_account_proxy_bindings(token):
|
||||
"valuesPrinted": False,
|
||||
}
|
||||
|
||||
def manual_account_protection_status(token):
|
||||
def manual_group_binding_enabled(protection):
|
||||
binding = protection.get("groupBinding") if isinstance(protection, dict) else None
|
||||
if not isinstance(binding, dict) or binding.get("enabled") is not True:
|
||||
return False
|
||||
if binding.get("source") != "pool-group":
|
||||
raise RuntimeError("manual account groupBinding source must be pool-group")
|
||||
return True
|
||||
|
||||
def manual_group_status(token, account, protection, group_id):
|
||||
enabled = manual_group_binding_enabled(protection)
|
||||
if not enabled:
|
||||
return {
|
||||
"enabled": False,
|
||||
"ok": True,
|
||||
"action": "not-configured",
|
||||
"valuesPrinted": False,
|
||||
}
|
||||
group_accounts = list_accounts_for_group(token, group_id)
|
||||
account_id = account.get("id") if isinstance(account, dict) else None
|
||||
account_name = account.get("name") if isinstance(account, dict) else None
|
||||
binding_aligned = any(
|
||||
item.get("id") == account_id or item.get("name") == account_name
|
||||
for item in group_accounts
|
||||
if isinstance(item, dict)
|
||||
)
|
||||
return {
|
||||
"enabled": True,
|
||||
"ok": binding_aligned,
|
||||
"action": "validate",
|
||||
"source": "pool-group",
|
||||
"poolGroupName": POOL_GROUP_NAME,
|
||||
"poolGroupId": group_id,
|
||||
"bindingAligned": binding_aligned,
|
||||
"valuesPrinted": False,
|
||||
}
|
||||
|
||||
def ensure_manual_account_group_bindings(token, group_id):
|
||||
items = []
|
||||
for protection in MANUAL_ACCOUNT_PROTECTIONS:
|
||||
if not isinstance(protection, dict):
|
||||
continue
|
||||
name = protection.get("accountName")
|
||||
if not isinstance(name, str) or not name:
|
||||
continue
|
||||
if not manual_group_binding_enabled(protection):
|
||||
items.append({
|
||||
"accountName": name,
|
||||
"enabled": False,
|
||||
"action": "not-configured",
|
||||
"ok": True,
|
||||
"valuesPrinted": False,
|
||||
})
|
||||
continue
|
||||
account = find_account_by_name(token, name)
|
||||
if not isinstance(account, dict):
|
||||
items.append({
|
||||
"accountName": name,
|
||||
"enabled": True,
|
||||
"action": "account-missing",
|
||||
"ok": False,
|
||||
"poolGroupName": POOL_GROUP_NAME,
|
||||
"poolGroupId": group_id,
|
||||
"valuesPrinted": False,
|
||||
})
|
||||
continue
|
||||
extra = account.get("extra") if isinstance(account.get("extra"), dict) else {}
|
||||
if extra.get("unidesk_managed") is True:
|
||||
items.append({
|
||||
"accountName": name,
|
||||
"accountId": account.get("id"),
|
||||
"enabled": True,
|
||||
"action": "refused-managed-runtime-account",
|
||||
"ok": False,
|
||||
"poolGroupName": POOL_GROUP_NAME,
|
||||
"poolGroupId": group_id,
|
||||
"valuesPrinted": False,
|
||||
})
|
||||
continue
|
||||
existing_group_ids = account_group_ids(token, account)
|
||||
desired_group_ids = sorted(set(existing_group_ids + [group_id]))
|
||||
action = "unchanged"
|
||||
if group_id not in existing_group_ids:
|
||||
updated = ensure_success(curl_api("PUT", f"/api/v1/admin/accounts/{account['id']}", bearer=token, payload={"group_ids": desired_group_ids}), f"bind manual account group {name}")
|
||||
account = updated if isinstance(updated, dict) else account
|
||||
action = "bound"
|
||||
binding_aligned = any(
|
||||
item.get("id") == account.get("id") or item.get("name") == name
|
||||
for item in list_accounts_for_group(token, group_id)
|
||||
if isinstance(item, dict)
|
||||
)
|
||||
items.append({
|
||||
"accountName": name,
|
||||
"accountId": account.get("id"),
|
||||
"enabled": True,
|
||||
"ok": binding_aligned,
|
||||
"action": action,
|
||||
"source": "pool-group",
|
||||
"poolGroupName": POOL_GROUP_NAME,
|
||||
"poolGroupId": group_id,
|
||||
"previousGroupIds": existing_group_ids,
|
||||
"desiredGroupIds": desired_group_ids,
|
||||
"bindingAligned": binding_aligned,
|
||||
"controlPolicy": "manual-protected: only pool group membership is YAML-controlled; credentials/status/schedulable are untouched and sentinel does not probe it",
|
||||
"valuesPrinted": False,
|
||||
})
|
||||
return {
|
||||
"ok": all(item.get("ok") is True for item in items),
|
||||
"itemCount": len(items),
|
||||
"items": items,
|
||||
"valuesPrinted": False,
|
||||
}
|
||||
|
||||
def manual_account_protection_status(token, group_id=None):
|
||||
items = []
|
||||
desired_names = set(EXPECTED_ACCOUNT_CAPACITIES.keys())
|
||||
for protection in MANUAL_ACCOUNT_PROTECTIONS:
|
||||
@@ -4701,6 +4876,7 @@ def manual_account_protection_status(token):
|
||||
account = find_account_by_name(token, name)
|
||||
extra = account.get("extra") if isinstance(account, dict) and isinstance(account.get("extra"), dict) else {}
|
||||
proxy_status = manual_proxy_status(token, account, protection)
|
||||
group_status = manual_group_status(token, account, protection, group_id) if group_id is not None else {"enabled": False, "ok": True, "action": "not-checked", "valuesPrinted": False}
|
||||
items.append({
|
||||
"accountName": name,
|
||||
"reason": protection.get("reason") if isinstance(protection.get("reason"), str) else None,
|
||||
@@ -4711,8 +4887,9 @@ def manual_account_protection_status(token):
|
||||
"inYamlProfiles": name in desired_names,
|
||||
"runtimeMarkedUnideskManaged": extra.get("unidesk_managed") is True,
|
||||
"proxyBinding": proxy_status,
|
||||
"ok": proxy_status.get("ok") is True,
|
||||
"controlPolicy": "manual-protected: no create/update/prune/probe/freeze; optional proxy_id binding only when proxyBinding is configured",
|
||||
"groupBinding": group_status,
|
||||
"ok": proxy_status.get("ok") is True and group_status.get("ok") is True,
|
||||
"controlPolicy": "manual-protected: no create/update/prune/probe/freeze; optional proxy_id and pool group membership binding only when configured",
|
||||
"valuesPrinted": False,
|
||||
})
|
||||
return {
|
||||
@@ -6334,7 +6511,8 @@ def run_sync():
|
||||
protected_frozen_names = active_sentinel_quarantine_names()
|
||||
account_results, pruned_account_results = ensure_accounts(token, profiles, group_id, prune_removed, protected_frozen_names, existing_accounts)
|
||||
manual_account_proxy_bindings = ensure_manual_account_proxy_bindings(token)
|
||||
manual_account_protections = manual_account_protection_status(token)
|
||||
manual_account_group_bindings = ensure_manual_account_group_bindings(token, group_id)
|
||||
manual_account_protections = manual_account_protection_status(token, group_id)
|
||||
capacity_status = account_capacity_status(token)
|
||||
load_factor_status = account_load_factor_status(token)
|
||||
ws_v2_status = account_ws_v2_status(token)
|
||||
@@ -6352,7 +6530,7 @@ def run_sync():
|
||||
sentinel_quality = ensure_sentinel_state_for_sync(account_results)
|
||||
sentinel_reassert = reassert_sentinel_freezes_after_sync(token)
|
||||
return {
|
||||
"ok": gateway["ok"] is True and responses_smoke["ok"] is True and owner_concurrency["ok"] is True and capacity_status["ok"] is True and load_factor_status["ok"] is True and ws_v2_status["ok"] is True and temp_unschedulable_status["ok"] is True and manual_account_proxy_bindings.get("ok") is True and manual_account_protections.get("ok") is True and sentinel.get("ok") is True and sentinel_quality_prepare.get("ok") is True and sentinel_quality.get("ok") is True and sentinel_reassert.get("ok") is True and runtime_capabilities.get("ok") is True,
|
||||
"ok": gateway["ok"] is True and responses_smoke["ok"] is True and owner_concurrency["ok"] is True and capacity_status["ok"] is True and load_factor_status["ok"] is True and ws_v2_status["ok"] is True and temp_unschedulable_status["ok"] is True and manual_account_proxy_bindings.get("ok") is True and manual_account_group_bindings.get("ok") is True and manual_account_protections.get("ok") is True and sentinel.get("ok") is True and sentinel_quality_prepare.get("ok") is True and sentinel_quality.get("ok") is True and sentinel_reassert.get("ok") is True and runtime_capabilities.get("ok") is True,
|
||||
"degraded": bool(responses_smoke.get("degraded")) or bool(compact_evidence.get("degraded")) or bool(responses_evidence.get("degraded")) or runtime_capabilities.get("ok") is not True,
|
||||
"mode": "sync",
|
||||
"namespace": NAMESPACE,
|
||||
@@ -6371,7 +6549,7 @@ def run_sync():
|
||||
"processControl": {"schedulableRestore": "sentinel marker probe only; sync does not restore schedulable for existing accounts", "durableConfig": False},
|
||||
"valuesPrinted": False,
|
||||
},
|
||||
"manualAccounts": {**manual_account_protections, "proxySync": manual_account_proxy_bindings},
|
||||
"manualAccounts": {**manual_account_protections, "proxySync": manual_account_proxy_bindings, "groupSync": manual_account_group_bindings},
|
||||
"capacity": capacity_status,
|
||||
"loadFactor": load_factor_status,
|
||||
"webSocketsV2": ws_v2_status,
|
||||
@@ -6410,7 +6588,8 @@ def run_validate():
|
||||
load_factor_status = account_load_factor_status(token)
|
||||
ws_v2_status = account_ws_v2_status(token)
|
||||
temp_unschedulable_status = account_temp_unschedulable_status(token)
|
||||
manual_account_protections = manual_account_protection_status(token)
|
||||
pool_group_id = key_item.get("group_id") if isinstance(key_item, dict) else None
|
||||
manual_account_protections = manual_account_protection_status(token, pool_group_id)
|
||||
gateway = validate_gateway(api_key)
|
||||
responses_smoke = validate_gateway_responses(api_key)
|
||||
compact_evidence = recent_compact_gateway_evidence()
|
||||
@@ -6429,6 +6608,7 @@ def run_validate():
|
||||
"secret": f"{NAMESPACE}/{POOL_API_KEY_SECRET_NAME}.{POOL_API_KEY_SECRET_KEY}",
|
||||
"sub2apiId": key_item.get("id") if isinstance(key_item, dict) else None,
|
||||
"userId": key_item.get("user_id") if isinstance(key_item, dict) else None,
|
||||
"groupId": key_item.get("group_id") if isinstance(key_item, dict) else None,
|
||||
"keyPreview": api_key_preview(api_key),
|
||||
"valuesPrinted": False,
|
||||
},
|
||||
|
||||
Reference in New Issue
Block a user