Merge pull request #1423 from pikasTech/fix/pk01-codex-pool-sync
支持 PK01 Codex pool sync
This commit is contained in:
@@ -1,6 +1,8 @@
|
||||
## Codex Pool
|
||||
|
||||
当前 codex-pool sync/validate/report/trace 适配器主要覆盖 k3s target。若 YAML 默认 target 是 PK01 host-Docker,不要直接把无 `--target` 的 codex-pool 命令当成验收入口;先使用 `sub2api status --target PK01`、`sub2api validate --target PK01` 和最小 public `/v1/responses` smoke。host-Docker codex-pool adapter 补齐前,k3s 账号池操作必须显式选择 k3s target:
|
||||
`codex-pool plan|sync|validate` 同时覆盖 k3s target 和 PK01 host-Docker target。PK01 host-Docker 的 `sync --confirm` 通过 Sub2API admin API 对齐 group、YAML-managed accounts、统一消费 API key、capacity/loadFactor、WebSocket v2 标记和内置 `temp_unschedulable` 规则;统一消费 key 写入 YAML 声明的 `targets[PK01].hostDocker.envPath`,不创建 k8s Secret,不部署 sentinel 资源,也不触发 `sub2api apply`、Docker compose、Caddy reload 或容器重启。`sentinel-report`、`sentinel-probe`、`sentinel-image` 和部分 `trace` 能力仍以 k3s target 为主;需要这些能力时显式选择对应 k3s target。
|
||||
|
||||
k3s 账号池操作示例:
|
||||
|
||||
```bash
|
||||
bun scripts/cli.ts platform-infra sub2api codex-pool plan --target D601
|
||||
@@ -17,7 +19,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --target D60
|
||||
`config/platform-infra/sub2api-codex-pool.yaml` 控制:
|
||||
|
||||
- `pool.groupName`: Sub2API group 名称。
|
||||
- `pool.apiKeySecretName` / `pool.apiKeySecretKey`: 统一消费 API key 的 k3s Secret 位置,默认 `platform-infra/sub2api-codex-pool-api-key.API_KEY`。
|
||||
- `pool.apiKeySecretName` / `pool.apiKeySecretKey`: 统一消费 API key 的 key 名。k3s target 写入对应 namespace 下的 k3s Secret;PK01 host-Docker target 写入 `config/platform-infra/sub2api.yaml` 中 `targets[PK01].hostDocker.envPath` 声明的 env 文件。
|
||||
- `pool.minOwnerBalanceUsd`: pool key owner 最低余额,sync/validate 会补齐。
|
||||
- `pool.minOwnerConcurrency`: 可选统一消费 API key owner 最低并发;省略时 CLI 自动使用所有已解析账号 capacity 的总和,sync/validate 会补齐。显式 YAML 值只作为 override,仍必须不小于账号 capacity 总和;未显式写 `profiles.entries[].capacity` 的账号会使用 `pool.defaultAccountCapacity` 参与求和,不要用提高某个 provider capacity 来掩盖用户并发层 WS 1013。
|
||||
- `pool.defaultTempUnschedulable`: Sub2API 内置请求路径临时不可调度开关和 YAML 规则列表。当前要求是按 YAML 开启通用规则;sync 把 `temp_unschedulable_enabled` / `temp_unschedulable_rules` 渲染到 managed accounts,让匹配的 400/5xx/超时/模型路由/加密内容错误短暂冷却当前账号并触发同组 failover。
|
||||
@@ -38,7 +40,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --target D60
|
||||
- `manualAccounts.protected`: 已在 Sub2API 手动创建/维护、且必须排除在 UniDesk-managed Codex pool credentials 和 sentinel 控制之外的账号。默认不得改 credentials/status/schedulable/priority/capacity/loadFactor;只有显式声明 `proxyBinding` 时,`sync --confirm` 才允许把该账号的 `proxy_id` 对齐到 YAML 目标的 egress proxy;只有显式声明 `groupBinding.source: pool-group` 时,才允许把该账号加入统一消费 API key 使用的 pool group。`targetIds` 可选;省略表示所有 target 都保护该账号,设置后只在匹配 target 上纳入 proxy/group 窄同步和 sentinel-probe 拒绝列表,避免 PK01-only 手动账号漂移卡住 JD01 pool。
|
||||
- Sentinel 配置、marker-only 判定、镜像、report/probe 和远端 job/poll 边界见 [sentinel.md](sentinel.md)。
|
||||
|
||||
对已支持的 k3s target,`sync --confirm` 会登录 Sub2API admin、创建/更新 group、创建/更新 YAML 中的 `unidesk-codex-*` accounts、创建/复用统一 API key Secret,并部署/更新哨兵资源;它不把既有 managed account 直接恢复为 `schedulable=true`。恢复只由哨兵在读取 Sub2API runtime `schedulable=false` 后触发 recovery probe,并在 marker 命中时执行。`sync` 默认不删除 YAML 中缺席的 managed account。只有明确退役上游时才使用 `sync --confirm --prune-removed` 删除缺席且 `extra.unidesk_managed=true` 的 `unidesk-codex-*` account。对 `manualAccounts.protected`,`sync` 只执行 YAML 显式允许的窄同步;当前允许项是从目标 `egressProxy` 创建/更新 Sub2API internal proxy 记录并绑定 `proxy_id`,以及把受保护手动账号加入当前 `pool.groupName`。它仍不接管该账号凭据、status、schedulable、priority/capacity/loadFactor 或哨兵状态。PK01 host-Docker target 在 codex-pool adapter 补齐前不具备这条完整 sync 路径。
|
||||
`sync --confirm` 会登录 Sub2API admin、创建/更新 group、创建/更新 YAML 中的 `unidesk-codex-*` accounts,并创建/复用统一 API key。k3s target 还会写入统一 API key Secret 并部署/更新哨兵资源;PK01 host-Docker target 只写 Sub2API runtime 和 host-Docker env 文件。`sync` 不把既有 managed account 直接恢复为 `schedulable=true`。恢复只由哨兵在读取 Sub2API runtime `schedulable=false` 后触发 recovery probe,并在 marker 命中时执行。`sync` 默认不删除 YAML 中缺席的 managed account。只有明确退役上游时才使用 `sync --confirm --prune-removed` 删除缺席且 `extra.unidesk_managed=true` 的 `unidesk-codex-*` account。对 `manualAccounts.protected`,`sync` 只执行 YAML 显式允许的窄同步;当前允许项是从目标 `egressProxy` 创建/更新 Sub2API internal proxy 记录并绑定 `proxy_id`,以及把受保护手动账号加入当前 `pool.groupName`。它仍不接管该账号凭据、status、schedulable、priority/capacity/loadFactor 或哨兵状态。若受保护手动账号在运行面缺失,`sync`/`validate` 会报告 manual account drift;不要自动创建、删除、接管或从 YAML 移除该账号。
|
||||
|
||||
`trace --request-id <requestId>` 是只读 request 追溯报表,不触发 probe、不修改账号。默认输出请求开始/最终状态、failover、`account_select_failed`、窗口内 `account_temp_unschedulable`、admin schedulable 写入计数和当前账号快照;`reason=failover-attempted-no-candidate` 表示 Sub2API 已进入自动切号,但排除当前失败账号后没有可用候选。需要机器处理时使用 `--raw`,需要原始匹配行时加 `--show-lines`。
|
||||
|
||||
|
||||
@@ -7,7 +7,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm
|
||||
|
||||
`configure-local --confirm` 会:
|
||||
|
||||
- 从 `platform-infra/<apiKeySecretName>.<apiKeySecretKey>` 读取统一 API key。
|
||||
- 从 active target 的统一 API key 位置读取 key:k3s target 读取 `platform-infra/<apiKeySecretName>.<apiKeySecretKey>`,PK01 host-Docker target 读取 YAML `hostDocker.envPath` 中的 `<apiKeySecretKey>`。
|
||||
- 把当前 `~/.codex/config.toml` 和 `~/.codex/auth.json` 备份为 `.<backupSuffix>`,默认 `.pre-sub2api`。
|
||||
- 重写默认 `~/.codex` 消费端,固定指向 `https://sub2api.74-48-78-17.nip.io/`,provider 名称和 wire API 来自 `localCodex`。
|
||||
- 按 `localCodex.modelContextWindow` / `localCodex.modelAutoCompactTokenLimit` 写入 `model_context_window` / `model_auto_compact_token_limit`,用于统一控制 Codex auto compact 触发窗口,避免 GPT-5.5 消费端生成过大的 `/responses/compact` 长请求。
|
||||
|
||||
@@ -63,7 +63,7 @@ bun scripts/cli.ts platform-infra sub2api status --target PK01
|
||||
bun scripts/cli.ts platform-infra sub2api validate --target PK01
|
||||
```
|
||||
|
||||
PK01 没有 k3s control plane。当前 `codex-pool sync`、`codex-pool validate`、`sentinel-report` 和 `trace` 的部分实现仍依赖 k8s/kubectl 远端脚本;在 PK01 host-Docker target 上看到 `kubectl` 缺失时,应归类为 CLI host-Docker adapter 缺口,不要误判为 Sub2API app、Caddy、上游或账号池故障。正式修复应补 host-Docker 版 codex-pool sync/validate/report/trace;临时排障只能做只读 admin API、DB join 表和最小公网 `/v1/responses` smoke,并且不得打印 admin password、API key 或账号凭据。
|
||||
PK01 没有 k3s control plane。`codex-pool sync --target PK01 --confirm` 和 `codex-pool validate --target PK01` 走 host-Docker adapter:通过本机 Sub2API admin API 和 YAML `hostDocker.envPath` 对齐账号池,不使用 k8s Secret/CronJob,也不重启容器。`sentinel-report`、`sentinel-probe`、`sentinel-image` 和部分 `trace` 能力仍可能依赖 k8s/kubectl;在这些命令上看到 `kubectl` 缺失时,应归类为 CLI host-Docker adapter 缺口,不要误判为 Sub2API app、Caddy、上游或账号池故障。临时排障只能做只读 admin API、DB join 表和最小公网 `/v1/responses` smoke,并且不得打印 admin password、API key 或账号凭据。
|
||||
|
||||
PK01 host-Docker apply 仍必须由 `platform-infra sub2api apply --target PK01 --confirm` 受控执行。若 dry-run 或 apply 输出显示 `docker compose is absent; apply will use raw docker run fallback`,这表示 CLI 选择了 host-Docker fallback,不是裸手工 Docker 操作;只要 YAML image、env、ports、Caddy managed block 和 `status/validate` 最终对齐,可作为受控滚动升级证据。不要改用手工 `docker run`、手工 compose 文件或直接编辑 PK01 Caddyfile。
|
||||
|
||||
|
||||
@@ -91,7 +91,7 @@
|
||||
`config/platform-infra/sub2api-codex-pool.yaml` controls the Codex-facing OpenAI-compatible pool:
|
||||
|
||||
- `pool.groupName` names the Sub2API group that represents the pool.
|
||||
- `pool.apiKeySecretName` and `pool.apiKeySecretKey` name the k3s Secret that stores the single consumer API key.
|
||||
- `pool.apiKeySecretName` and `pool.apiKeySecretKey` name the single consumer API key. k3s targets store it in a k3s Secret; PK01 host-Docker stores the same key in the YAML-declared `hostDocker.envPath`.
|
||||
- `pool.minOwnerConcurrency` is optional; when omitted, the CLI automatically uses the sum of all resolved account capacities as the minimum concurrency for the Sub2API user that owns the unified consumer API key. A YAML value is only an explicit override and must still be at least that capacity sum, so the shared key does not fail requests or WS sessions at the user-concurrency layer. "Resolved" means each account's explicit `profiles.entries[].capacity` or, when omitted, `pool.defaultAccountCapacity`. Do not compensate for owner-concurrency 1013 errors by pinning capacity to one provider.
|
||||
- `pool.defaultTempUnschedulable` is the Sub2API built-in request-path temporary-unschedulable switch plus its YAML rule list. When enabled, `codex-pool sync --confirm` renders `temp_unschedulable_enabled` and `temp_unschedulable_rules` into every managed account unless an account-level override says otherwise. This is the generic same-request recovery path for selected-account upstream failures: a matching upstream error briefly cools the selected account so Sub2API's existing failover loop can select another account in the same group.
|
||||
- The built-in temporary-unschedulable configuration and external `sentinel.*` configuration are separate control surfaces. `pool.defaultTempUnschedulable` handles near-real-time request-path cooling and failover; `sentinel.*` handles account-level marker health, quarantine, restore, and probe cadence. Changing one surface must not silently rewrite the other surface's cadence, marker semantics, quarantine state, or rule list.
|
||||
@@ -99,7 +99,7 @@
|
||||
- Codex accounts selected by YAML do not declare `schedulable` as durable configuration. `codex-pool sync --confirm` must not restore existing account schedulability merely because YAML selects the account or sentinel state lacks an active quarantine. Existing `schedulable=false` is runtime state: the sentinel first reads Sub2API's actual account state, schedules a recovery probe for unschedulable managed accounts, and restores `schedulable=true` only after the marker probe matches.
|
||||
- `codex-pool sync --confirm` preserves UniDesk-managed accounts that are absent from YAML by default; explicit upstream retirement requires `codex-pool sync --confirm --prune-removed`. This keeps account deletion out of the normal availability-recovery path and prevents temporary YAML edits from becoming destructive runtime changes.
|
||||
- `profiles.entries` selects local Codex profile files from `~/.codex/` and maps them to Sub2API account names.
|
||||
- The unsuffixed master `~/.codex/config.toml` and `~/.codex/auth.json` are reserved for the unified Sub2API consumer. `config.toml` must keep the YAML-selected consumer base URL written by `codex-pool configure-local --target <active> --confirm`, and `auth.json` must contain the unified pool API key from `pool.apiKeySecretName` / `pool.apiKeySecretKey` on that active target. Do not replace these two files with direct upstream account credentials.
|
||||
- The unsuffixed master `~/.codex/config.toml` and `~/.codex/auth.json` are reserved for the unified Sub2API consumer. `config.toml` must keep the YAML-selected consumer base URL written by `codex-pool configure-local --target <active> --confirm`, and `auth.json` must contain the unified pool API key from the active target's `pool.apiKeySecretName` / `pool.apiKeySecretKey` location. Do not replace these two files with direct upstream account credentials.
|
||||
- Additional upstream accounts must use suffixed local profile files such as `config.toml.<profile>` and `auth.json.<profile>`, then be declared through `profiles.entries` in `config/platform-infra/sub2api-codex-pool.yaml`.
|
||||
- `profiles.entries[].capacity` optionally overrides `pool.defaultAccountCapacity` for one account. Capacity is a YAML-controlled routing input; concrete current values belong only in `config/platform-infra/sub2api-codex-pool.yaml` and runtime validation output, not in long-term reference prose. Code constants, Secrets, ad-hoc runtime patches, or stale tests must not override YAML source of truth.
|
||||
- `profiles.entries[].loadFactor` optionally overrides `pool.defaultAccountLoadFactor` for one account and is rendered to Sub2API `load_factor`. Treat it as routing policy: values belong in YAML and `codex-pool validate` output, not code constants, Secrets, or ad-hoc runtime patches.
|
||||
|
||||
@@ -59,7 +59,9 @@ export function codexPoolPlan(options?: DisclosureOptions): Record<string, unkno
|
||||
decision: {
|
||||
accountType: "openai/apikey",
|
||||
grouping: `All discovered Codex profiles are bound to one Sub2API group named ${pool.groupName}.`,
|
||||
unifiedApiKey: `The client-facing API_KEY is controlled by k3s Secret ${runtimeTarget.namespace}/${pool.apiKeySecretName}.${pool.apiKeySecretKey}.`,
|
||||
unifiedApiKey: runtimeTarget.runtimeMode === "host-docker"
|
||||
? `The client-facing API_KEY is controlled by host-Docker env source ${runtimeTarget.hostDockerEnvPath}.${pool.apiKeySecretKey}.`
|
||||
: `The client-facing API_KEY is controlled by k3s Secret ${runtimeTarget.namespace}/${pool.apiKeySecretName}.${pool.apiKeySecretKey}.`,
|
||||
sentinel: pool.sentinel.monitor.enabled
|
||||
? `Account sentinel is enabled as k8s CronJob ${runtimeTarget.namespace}/${pool.sentinel.cronJobName}; actions.enabled=${pool.sentinel.actions.enabled}.`
|
||||
: "Account sentinel monitoring is disabled by YAML.",
|
||||
@@ -73,13 +75,7 @@ export function codexPoolPlan(options?: DisclosureOptions): Record<string, unkno
|
||||
: `${pool.manualAccounts.protected.length} manual Sub2API account(s) are protected from UniDesk-managed credentials, prune, sentinel probe, and sentinel freeze paths; only explicitly declared proxy/group bindings are reconciled.`,
|
||||
},
|
||||
next: ok
|
||||
? runtimeTarget.runtimeMode === "host-docker"
|
||||
? {
|
||||
expose: `bun scripts/cli.ts platform-infra sub2api codex-pool expose${targetFlag(runtimeTarget)} --confirm`,
|
||||
validate: `bun scripts/cli.ts platform-infra sub2api validate${targetFlag(runtimeTarget)}`,
|
||||
note: "PK01 host-Docker target does not run the k3s codex-pool sync path.",
|
||||
}
|
||||
: { sync: `bun scripts/cli.ts platform-infra sub2api codex-pool sync${targetFlag(runtimeTarget)} --confirm` }
|
||||
? { sync: `bun scripts/cli.ts platform-infra sub2api codex-pool sync${targetFlag(runtimeTarget)} --confirm` }
|
||||
: { fix: "Ensure every discovered config.toml profile has a base_url and either auth.json OPENAI_API_KEY or the configured env_key present in this shell." },
|
||||
};
|
||||
}
|
||||
@@ -89,24 +85,6 @@ export async function codexPoolSync(config: UniDeskConfig, options: SyncOptions)
|
||||
const runtimeTarget = codexPoolRuntimeTarget(options.targetId);
|
||||
const profiles = collectCodexProfiles();
|
||||
const planOk = profiles.length > 0 && profiles.every((profile) => profile.ok);
|
||||
if (runtimeTarget.runtimeMode === "host-docker" && options.confirm) {
|
||||
return {
|
||||
ok: false,
|
||||
action: "platform-infra-sub2api-codex-pool-sync",
|
||||
mode: "blocked-host-docker-sync-unsupported",
|
||||
target: poolTarget(pool, runtimeTarget),
|
||||
reason: "PK01 host-Docker target does not run the k3s codex-pool sync path; Sub2API runtime is controlled by platform-infra sub2api apply/validate.",
|
||||
local: {
|
||||
profileCount: profiles.length,
|
||||
invalidProfiles: profiles.filter((profile) => !profile.ok).map(compactProfile),
|
||||
valuesPrinted: false,
|
||||
},
|
||||
next: {
|
||||
expose: `bun scripts/cli.ts platform-infra sub2api codex-pool expose${targetFlag(runtimeTarget)} --confirm`,
|
||||
validate: `bun scripts/cli.ts platform-infra sub2api validate${targetFlag(runtimeTarget)}`,
|
||||
},
|
||||
};
|
||||
}
|
||||
if (!options.confirm || !planOk) {
|
||||
const plan = {
|
||||
...codexPoolPlan(options),
|
||||
|
||||
@@ -28,6 +28,59 @@ import { codexPoolRuntimeTarget } from "./runtime-target";
|
||||
import { sub2apiConfigPath } from "./types";
|
||||
|
||||
export async function fetchPoolApiKey(config: UniDeskConfig, pool: CodexPoolConfig, target = codexPoolRuntimeTarget()): Promise<{ apiKey: string | null; error: string | null }> {
|
||||
if (target.runtimeMode === "host-docker") {
|
||||
const envPath = target.hostDockerEnvPath;
|
||||
if (envPath === null) return { apiKey: null, error: "host-docker envPath missing" };
|
||||
const result = await capture(config, target.route, ["sh"], `
|
||||
set -u
|
||||
python3 - <<'PY'
|
||||
import base64
|
||||
import json
|
||||
import subprocess
|
||||
path = ${JSON.stringify(envPath)}
|
||||
key = ${JSON.stringify(pool.apiKeySecretKey)}
|
||||
values = {}
|
||||
try:
|
||||
with open(path, "r", encoding="utf-8") as handle:
|
||||
lines = handle.read().splitlines()
|
||||
except FileNotFoundError:
|
||||
print(json.dumps({"ok": False, "error": "env-source-missing", "path": path, "valuesPrinted": False}))
|
||||
raise SystemExit(1)
|
||||
except PermissionError:
|
||||
proc = subprocess.run(["sudo", "-n", "cat", path], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
|
||||
if proc.returncode != 0:
|
||||
print(json.dumps({"ok": False, "error": "env-source-unreadable", "path": path, "stderrTail": proc.stderr.decode("utf-8", errors="replace")[-500:], "valuesPrinted": False}))
|
||||
raise SystemExit(1)
|
||||
lines = proc.stdout.decode("utf-8", errors="replace").splitlines()
|
||||
for line in lines:
|
||||
stripped = line.strip()
|
||||
if not stripped or stripped.startswith("#") or "=" not in stripped:
|
||||
continue
|
||||
current_key, value = stripped.split("=", 1)
|
||||
current_key = current_key.strip()
|
||||
value = value.strip()
|
||||
if len(value) >= 2 and value[0] == value[-1] and value[0] in ("'", '"'):
|
||||
value = value[1:-1]
|
||||
values[current_key] = value
|
||||
value = values.get(key)
|
||||
if not value:
|
||||
print(json.dumps({"ok": False, "error": "api-key-missing", "path": path, "key": key, "valuesPrinted": False}))
|
||||
raise SystemExit(1)
|
||||
print(json.dumps({"ok": True, "apiKeyB64": base64.b64encode(value.encode()).decode(), "path": path, "key": key, "valuesPrinted": False}))
|
||||
PY
|
||||
`);
|
||||
if (result.exitCode !== 0) return { apiKey: null, error: `read host pool API key source failed: ${result.stderr.slice(-1000) || result.stdout.slice(-1000)}` };
|
||||
const parsed = parseJsonOutput(result.stdout);
|
||||
if (!isRecord(parsed) || parsed.ok !== true || typeof parsed.apiKeyB64 !== "string") {
|
||||
return { apiKey: null, error: `${envPath}.${pool.apiKeySecretKey} missing` };
|
||||
}
|
||||
try {
|
||||
const apiKey = Buffer.from(parsed.apiKeyB64, "base64").toString("utf8");
|
||||
return apiKey.length > 0 ? { apiKey, error: null } : { apiKey: null, error: "decoded API key is empty" };
|
||||
} catch (error) {
|
||||
return { apiKey: null, error: error instanceof Error ? error.message : String(error) };
|
||||
}
|
||||
}
|
||||
const result = await capture(config, target.route, ["sh"], `
|
||||
set -u
|
||||
kubectl -n ${target.namespace} get secret ${pool.apiKeySecretName} -o json
|
||||
|
||||
@@ -329,6 +329,7 @@ export interface Sub2ApiRuntimeConfig {
|
||||
defaultTargetId: string;
|
||||
appSecretName: string;
|
||||
secretsRoot: string;
|
||||
appSourceRef: string;
|
||||
sentinelEnabledOnTargets: string[];
|
||||
targets: Record<string, unknown>[];
|
||||
}
|
||||
|
||||
@@ -41,7 +41,9 @@ export function poolTarget(pool = readCodexPoolConfig(), target = codexPoolRunti
|
||||
configPath: codexPoolConfigPath,
|
||||
groupName: pool.groupName,
|
||||
apiKeyName: pool.apiKeyName,
|
||||
apiKeySecret: `${target.namespace}/${pool.apiKeySecretName}.${pool.apiKeySecretKey}`,
|
||||
apiKeySecret: target.runtimeMode === "host-docker"
|
||||
? `${target.hostDockerEnvPath}.${pool.apiKeySecretKey}`
|
||||
: `${target.namespace}/${pool.apiKeySecretName}.${pool.apiKeySecretKey}`,
|
||||
publicExposure: targetPublicExposureSummary(target),
|
||||
sentinelImageBuild: {
|
||||
source: `${sub2apiConfigPath}.targets[${target.id}].codexPool.sentinelImageBuild`,
|
||||
|
||||
@@ -27,12 +27,14 @@ import { resolvedManualAccountProtections } from "./public-exposure";
|
||||
import { fieldManager } from "./types";
|
||||
|
||||
export function remotePythonScript(mode: "sync" | "validate" | "trace" | "cleanup-probes" | "sentinel-probe", encodedPayload: string, pool: CodexPoolConfig, target: CodexPoolRuntimeTarget): string {
|
||||
const hostDockerEnvPath = target.runtimeMode === "host-docker" ? target.hostDockerEnvPath : null;
|
||||
return `
|
||||
set -u
|
||||
python3 - <<'PY'
|
||||
import base64
|
||||
import hashlib
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import secrets
|
||||
import string
|
||||
@@ -43,9 +45,13 @@ from datetime import datetime, timezone, timedelta
|
||||
from urllib.parse import quote
|
||||
|
||||
TARGET_ID = ${JSON.stringify(target.id)}
|
||||
RUNTIME_MODE = ${JSON.stringify(target.runtimeMode)}
|
||||
NAMESPACE = ${JSON.stringify(target.namespace)}
|
||||
SERVICE_NAME = ${JSON.stringify(target.serviceName)}
|
||||
SERVICE_DNS = ${JSON.stringify(target.serviceDns)}
|
||||
HOST_DOCKER_APP_PORT = ${JSON.stringify(target.hostDockerAppPort)}
|
||||
HOST_DOCKER_ENV_PATH = ${JSON.stringify(hostDockerEnvPath)}
|
||||
HOST_DOCKER_APP_CONTAINER = "sub2api-app"
|
||||
FIELD_MANAGER = "${fieldManager}"
|
||||
APP_SECRET_NAME = ${JSON.stringify(target.appSecretName)}
|
||||
POOL_GROUP_NAME = "${pool.groupName}"
|
||||
@@ -80,6 +86,107 @@ def text(data, limit=4000):
|
||||
data = data.decode("utf-8", errors="replace")
|
||||
return data[-limit:]
|
||||
|
||||
def read_host_env():
|
||||
if RUNTIME_MODE != "host-docker":
|
||||
return {}
|
||||
if not isinstance(HOST_DOCKER_ENV_PATH, str) or not HOST_DOCKER_ENV_PATH:
|
||||
raise RuntimeError("host-docker env source path missing")
|
||||
values = {}
|
||||
lines = read_host_env_lines()
|
||||
for line in lines:
|
||||
stripped = line.strip()
|
||||
if not stripped or stripped.startswith("#") or "=" not in stripped:
|
||||
continue
|
||||
key, value = stripped.split("=", 1)
|
||||
key = key.strip()
|
||||
value = value.strip()
|
||||
if len(value) >= 2 and value[0] == value[-1] and value[0] in ("'", '"'):
|
||||
value = value[1:-1]
|
||||
if key:
|
||||
values[key] = value
|
||||
return values
|
||||
|
||||
def read_host_env_lines():
|
||||
try:
|
||||
with open(HOST_DOCKER_ENV_PATH, "r", encoding="utf-8") as handle:
|
||||
return handle.read().splitlines()
|
||||
except FileNotFoundError:
|
||||
raise RuntimeError(f"host-docker env source missing: {HOST_DOCKER_ENV_PATH}")
|
||||
except PermissionError:
|
||||
proc = run(["sudo", "-n", "cat", HOST_DOCKER_ENV_PATH])
|
||||
if proc.returncode != 0:
|
||||
raise RuntimeError("read host-docker env source failed: " + text(proc.stderr, 1000))
|
||||
return proc.stdout.decode("utf-8", errors="replace").splitlines()
|
||||
|
||||
def write_host_env_value(key, value):
|
||||
if RUNTIME_MODE != "host-docker":
|
||||
raise RuntimeError("write_host_env_value is only valid for host-docker")
|
||||
if not isinstance(HOST_DOCKER_ENV_PATH, str) or not HOST_DOCKER_ENV_PATH:
|
||||
raise RuntimeError("host-docker env source path missing")
|
||||
if not re.match(r"^[A-Za-z_][A-Za-z0-9_]*$", key):
|
||||
raise RuntimeError(f"unsupported env key: {key}")
|
||||
os.makedirs(os.path.dirname(HOST_DOCKER_ENV_PATH), exist_ok=True)
|
||||
try:
|
||||
lines = read_host_env_lines()
|
||||
except RuntimeError as exc:
|
||||
if "missing" not in str(exc):
|
||||
raise
|
||||
lines = []
|
||||
next_lines = []
|
||||
replaced = False
|
||||
for line in lines:
|
||||
stripped = line.strip()
|
||||
if stripped.startswith("#") or "=" not in stripped:
|
||||
next_lines.append(line)
|
||||
continue
|
||||
current_key = stripped.split("=", 1)[0].strip()
|
||||
if current_key == key:
|
||||
next_lines.append(f"{key}={value}")
|
||||
replaced = True
|
||||
else:
|
||||
next_lines.append(line)
|
||||
if not replaced:
|
||||
next_lines.append(f"{key}={value}")
|
||||
content = "\\n".join(next_lines).rstrip() + "\\n"
|
||||
tmp_path = HOST_DOCKER_ENV_PATH + ".tmp"
|
||||
try:
|
||||
with open(tmp_path, "w", encoding="utf-8") as handle:
|
||||
handle.write(content)
|
||||
os.chmod(tmp_path, 0o600)
|
||||
os.replace(tmp_path, HOST_DOCKER_ENV_PATH)
|
||||
except PermissionError:
|
||||
try:
|
||||
os.unlink(tmp_path)
|
||||
except Exception:
|
||||
pass
|
||||
script = r'''
|
||||
set -eu
|
||||
path="$1"
|
||||
dir="$(dirname "$path")"
|
||||
mkdir -p "$dir"
|
||||
tmp="$path.tmp.$$"
|
||||
umask 077
|
||||
cat > "$tmp"
|
||||
mv "$tmp" "$path"
|
||||
chmod 600 "$path"
|
||||
'''
|
||||
proc = run(["sudo", "-n", "sh", "-c", script, "sh", HOST_DOCKER_ENV_PATH], content.encode("utf-8"))
|
||||
if proc.returncode != 0:
|
||||
raise RuntimeError("write host-docker env source failed: " + text(proc.stderr, 1000))
|
||||
return "updated" if replaced else "created"
|
||||
|
||||
def docker(args):
|
||||
proc = run(["docker", *args])
|
||||
if proc.returncode == 0:
|
||||
return proc
|
||||
sudo_proc = run(["sudo", "-n", "docker", *args])
|
||||
return sudo_proc if sudo_proc.returncode == 0 else proc
|
||||
|
||||
def runtime_logs(since, tail):
|
||||
if RUNTIME_MODE == "host-docker":
|
||||
return docker(["logs", f"--since={since}", f"--tail={tail}", HOST_DOCKER_APP_CONTAINER])
|
||||
return kubectl(["-n", NAMESPACE, "logs", "deployment/sub2api", f"--since={since}", f"--tail={tail}"])
|
||||
|
||||
def kubectl(args, input_obj=None):
|
||||
if isinstance(input_obj, str):
|
||||
input_bytes = input_obj.encode("utf-8")
|
||||
@@ -98,17 +205,23 @@ def kube_json(args, label):
|
||||
return json.loads(raw.decode("utf-8"))
|
||||
|
||||
def decode_secret_value(name, key):
|
||||
if RUNTIME_MODE == "host-docker":
|
||||
return read_host_env().get(key)
|
||||
data = kube_json(["-n", NAMESPACE, "get", "secret", name], f"secret/{name}").get("data") or {}
|
||||
if key not in data:
|
||||
return None
|
||||
return base64.b64decode(data[key]).decode("utf-8")
|
||||
|
||||
def get_config_value(name, key):
|
||||
if RUNTIME_MODE == "host-docker":
|
||||
return read_host_env().get(key)
|
||||
data = kube_json(["-n", NAMESPACE, "get", "configmap", name], f"configmap/{name}").get("data") or {}
|
||||
value = data.get(key)
|
||||
return value if isinstance(value, str) and value else None
|
||||
|
||||
def select_app_pod():
|
||||
if RUNTIME_MODE == "host-docker":
|
||||
return HOST_DOCKER_APP_CONTAINER
|
||||
pods = kube_json(["-n", NAMESPACE, "get", "pods", "-l", "app.kubernetes.io/name=sub2api"], "sub2api pods").get("items") or []
|
||||
for pod in pods:
|
||||
status = pod.get("status") or {}
|
||||
@@ -241,10 +354,15 @@ else
|
||||
fi
|
||||
fi
|
||||
'''
|
||||
proc = run([
|
||||
"kubectl", "-n", NAMESPACE, "exec", "-i", APP_POD,
|
||||
"--", "sh", "-c", script, "sh", method, f"http://127.0.0.1:8080{path}", bearer or "",
|
||||
], body)
|
||||
if RUNTIME_MODE == "host-docker":
|
||||
if not isinstance(HOST_DOCKER_APP_PORT, int):
|
||||
raise RuntimeError("host-docker app port missing")
|
||||
proc = run(["sh", "-c", script, "sh", method, f"http://127.0.0.1:{HOST_DOCKER_APP_PORT}{path}", bearer or ""], body)
|
||||
else:
|
||||
proc = run([
|
||||
"kubectl", "-n", NAMESPACE, "exec", "-i", APP_POD,
|
||||
"--", "sh", "-c", script, "sh", method, f"http://127.0.0.1:8080{path}", bearer or "",
|
||||
], body)
|
||||
return parse_curl_output(proc)
|
||||
|
||||
def envelope_data(parsed):
|
||||
@@ -996,6 +1114,9 @@ def ensure_api_key_secret(group_id, token):
|
||||
secret_action = "reused-existing-sub2api-key"
|
||||
else:
|
||||
secret_action = "created"
|
||||
if RUNTIME_MODE == "host-docker":
|
||||
env_action = "kept-existing" if existing else write_host_env_value(POOL_API_KEY_SECRET_KEY, api_key)
|
||||
return api_key, secret_action, f"host-docker-env:{env_action};source={HOST_DOCKER_ENV_PATH};key={POOL_API_KEY_SECRET_KEY};valuesPrinted=false"
|
||||
manifest = {
|
||||
"apiVersion": "v1",
|
||||
"kind": "Secret",
|
||||
@@ -1022,6 +1143,11 @@ def ensure_api_key_secret(group_id, token):
|
||||
raise RuntimeError(f"apply API key secret failed: {text(proc.stderr, 1000)}")
|
||||
return api_key, secret_action, text(proc.stdout, 1000)
|
||||
|
||||
def pool_api_key_secret_location():
|
||||
if RUNTIME_MODE == "host-docker":
|
||||
return f"{HOST_DOCKER_ENV_PATH}.{POOL_API_KEY_SECRET_KEY}"
|
||||
return f"{NAMESPACE}/{POOL_API_KEY_SECRET_NAME}.{POOL_API_KEY_SECRET_KEY}"
|
||||
|
||||
def apply_sentinel_manifest(manifest):
|
||||
if not TARGET_SENTINEL_ENABLED:
|
||||
return {
|
||||
@@ -1190,6 +1316,8 @@ def parse_epoch_z(value):
|
||||
return None
|
||||
|
||||
def sentinel_state_object():
|
||||
if not TARGET_SENTINEL_ENABLED:
|
||||
return None, None
|
||||
state_name = SENTINEL_CONFIG.get("stateConfigMapName")
|
||||
if not state_name:
|
||||
return None, None
|
||||
@@ -1205,6 +1333,8 @@ def sentinel_state_object():
|
||||
return obj, None
|
||||
|
||||
def active_sentinel_quarantine_names():
|
||||
if not TARGET_SENTINEL_ENABLED:
|
||||
return set()
|
||||
_, state = sentinel_state_object()
|
||||
if not isinstance(state, dict):
|
||||
return set()
|
||||
@@ -1668,7 +1798,7 @@ def response_output_preview(parsed):
|
||||
return "\\n".join(parts)[:240]
|
||||
|
||||
def request_log_evidence(request_id):
|
||||
proc = kubectl(["-n", NAMESPACE, "logs", "deployment/sub2api", "--since=5m", "--tail=800"])
|
||||
proc = runtime_logs("5m", 800)
|
||||
stdout = proc.stdout.decode("utf-8", errors="replace")
|
||||
lines = [line for line in stdout.splitlines() if request_id in line]
|
||||
failovers = []
|
||||
@@ -1705,7 +1835,7 @@ def request_log_evidence(request_id):
|
||||
}
|
||||
|
||||
def recent_compact_gateway_evidence():
|
||||
proc = kubectl(["-n", NAMESPACE, "logs", "deployment/sub2api", "--since=6h", "--tail=2500"])
|
||||
proc = runtime_logs("6h", 2500)
|
||||
stdout = proc.stdout.decode("utf-8", errors="replace")
|
||||
failures = []
|
||||
successes = []
|
||||
@@ -1830,7 +1960,7 @@ def failover_budget_exhausted_evidence(failovers, final_errors):
|
||||
return exhausted
|
||||
|
||||
def recent_responses_gateway_evidence():
|
||||
proc = kubectl(["-n", NAMESPACE, "logs", "deployment/sub2api", "--since=6h", "--tail=2500"])
|
||||
proc = runtime_logs("6h", 2500)
|
||||
stdout = proc.stdout.decode("utf-8", errors="replace")
|
||||
failovers = []
|
||||
forward_failures = []
|
||||
@@ -1936,6 +2066,7 @@ def validate_gateway_responses(api_key):
|
||||
set -eu
|
||||
token="$1"
|
||||
request_id="$2"
|
||||
url="$3"
|
||||
tmp="$(mktemp)"
|
||||
trap 'rm -f "$tmp"' EXIT
|
||||
cat > "$tmp"
|
||||
@@ -1945,13 +2076,18 @@ curl -sS -w '\\n__HTTP_CODE__:%{http_code}' -X POST \
|
||||
-H "X-Request-ID: $request_id" \
|
||||
-H "OpenAI-Client-Request-ID: $request_id" \
|
||||
--data-binary @"$tmp" \
|
||||
http://127.0.0.1:8080/v1/responses
|
||||
"$url"
|
||||
'''
|
||||
started = time.time()
|
||||
proc = run([
|
||||
"kubectl", "-n", NAMESPACE, "exec", "-i", APP_POD,
|
||||
"--", "sh", "-c", script, "sh", api_key, request_id,
|
||||
], body)
|
||||
if RUNTIME_MODE == "host-docker":
|
||||
if not isinstance(HOST_DOCKER_APP_PORT, int):
|
||||
raise RuntimeError("host-docker app port missing")
|
||||
proc = run(["sh", "-c", script, "sh", api_key, request_id, f"http://127.0.0.1:{HOST_DOCKER_APP_PORT}/v1/responses"], body)
|
||||
else:
|
||||
proc = run([
|
||||
"kubectl", "-n", NAMESPACE, "exec", "-i", APP_POD,
|
||||
"--", "sh", "-c", script, "sh", api_key, request_id, "http://127.0.0.1:8080/v1/responses",
|
||||
], body)
|
||||
resp = parse_curl_output(proc)
|
||||
evidence = request_log_evidence(request_id)
|
||||
parsed = resp.get("json")
|
||||
@@ -2123,6 +2259,31 @@ def validate_runtime_capabilities(token):
|
||||
}
|
||||
|
||||
def app_pod_runtime_image():
|
||||
if RUNTIME_MODE == "host-docker":
|
||||
proc = docker(["inspect", HOST_DOCKER_APP_CONTAINER])
|
||||
if proc.returncode != 0:
|
||||
return {
|
||||
"container": HOST_DOCKER_APP_CONTAINER,
|
||||
"error": text(proc.stderr, 1000) or text(proc.stdout, 1000),
|
||||
}
|
||||
try:
|
||||
data = json.loads(proc.stdout.decode("utf-8"))
|
||||
item = data[0] if isinstance(data, list) and data else {}
|
||||
except Exception as exc:
|
||||
return {"container": HOST_DOCKER_APP_CONTAINER, "error": str(exc)}
|
||||
state = item.get("State") if isinstance(item, dict) and isinstance(item.get("State"), dict) else {}
|
||||
health = state.get("Health") if isinstance(state.get("Health"), dict) else {}
|
||||
config = item.get("Config") if isinstance(item, dict) and isinstance(item.get("Config"), dict) else {}
|
||||
return {
|
||||
"container": HOST_DOCKER_APP_CONTAINER,
|
||||
"id": (item.get("Id") or "")[:12] if isinstance(item.get("Id"), str) else None,
|
||||
"image": config.get("Image"),
|
||||
"imageID": item.get("Image"),
|
||||
"ready": state.get("Running") is True and (not health or health.get("Status") in (None, "healthy")),
|
||||
"restartCount": item.get("RestartCount"),
|
||||
"startedAt": state.get("StartedAt"),
|
||||
"health": health.get("Status"),
|
||||
}
|
||||
try:
|
||||
pod = kube_json(["-n", NAMESPACE, "get", "pod", APP_POD], f"pod/{APP_POD}")
|
||||
spec_containers = ((pod.get("spec") or {}).get("containers") or []) if isinstance(pod, dict) else []
|
||||
@@ -2474,7 +2635,7 @@ def run_sync():
|
||||
"tempUnschedulable": temp_unschedulable_status,
|
||||
"apiKey": {
|
||||
"name": POOL_API_KEY_NAME,
|
||||
"secret": f"{NAMESPACE}/{POOL_API_KEY_SECRET_NAME}.{POOL_API_KEY_SECRET_KEY}",
|
||||
"secret": pool_api_key_secret_location(),
|
||||
"secretAction": secret_action,
|
||||
"secretApply": secret_apply_stdout,
|
||||
"sub2apiAction": api_key_result["action"],
|
||||
@@ -2523,7 +2684,7 @@ def run_validate():
|
||||
"appPod": APP_POD,
|
||||
"admin": {"email": admin_email, "tokenPrinted": False, "compliance": admin_compliance},
|
||||
"apiKey": {
|
||||
"secret": f"{NAMESPACE}/{POOL_API_KEY_SECRET_NAME}.{POOL_API_KEY_SECRET_KEY}",
|
||||
"secret": pool_api_key_secret_location(),
|
||||
"sub2apiId": key_item.get("id") if isinstance(key_item, dict) else None,
|
||||
"userId": key_item.get("user_id") if isinstance(key_item, dict) else None,
|
||||
"groupId": key_item.get("group_id") if isinstance(key_item, dict) else None,
|
||||
|
||||
@@ -41,6 +41,8 @@ export function readSub2ApiRuntimeConfig(): Sub2ApiRuntimeConfig {
|
||||
const secrets = runtime !== null && isRecord(runtime.secrets) ? runtime.secrets : null;
|
||||
const secretsRoot = secrets === null ? null : stringValue(secrets.root);
|
||||
if (secretsRoot === null || !secretsRoot.startsWith("/")) throw new Error(`${sub2apiConfigPath}.runtime.secrets.root must be an absolute path`);
|
||||
const appSourceRef = secrets === null ? null : stringValue(secrets.appSourceRef);
|
||||
if (appSourceRef === null || !/^[A-Za-z0-9_./-]+$/u.test(appSourceRef)) throw new Error(`${sub2apiConfigPath}.runtime.secrets.appSourceRef has an unsupported format`);
|
||||
const sentinel = runtime !== null && isRecord(runtime.sentinel) ? runtime.sentinel : null;
|
||||
const enabledOnTargets = Array.isArray(sentinel?.enabledOnTargets)
|
||||
? sentinel.enabledOnTargets.map((entry) => stringValue(entry)).filter((entry): entry is string => entry !== null && entry.length > 0)
|
||||
@@ -50,6 +52,7 @@ export function readSub2ApiRuntimeConfig(): Sub2ApiRuntimeConfig {
|
||||
defaultTargetId,
|
||||
appSecretName,
|
||||
secretsRoot,
|
||||
appSourceRef,
|
||||
sentinelEnabledOnTargets: enabledOnTargets,
|
||||
targets: parsed.targets,
|
||||
};
|
||||
@@ -99,9 +102,13 @@ export function codexPoolRuntimeTarget(targetId?: string): CodexPoolRuntimeTarge
|
||||
if (publicExposure !== null && publicExposure.enabled) publicBaseUrl = publicExposure.publicBaseUrl;
|
||||
const hostDocker = runtimeMode === "host-docker" && isRecord(raw.hostDocker) ? raw.hostDocker : null;
|
||||
const hostDockerAppPort = hostDocker === null ? null : numberValue(hostDocker.appPort);
|
||||
const hostDockerEnvPath = hostDocker === null ? null : stringValue(hostDocker.envPath);
|
||||
if (runtimeMode === "host-docker" && (hostDockerAppPort === null || !Number.isInteger(hostDockerAppPort) || hostDockerAppPort < 1 || hostDockerAppPort > 65535)) {
|
||||
throw new Error(`${sub2apiConfigPath}.targets[${id}].hostDocker.appPort must be an integer TCP port when runtimeMode=host-docker`);
|
||||
}
|
||||
if (runtimeMode === "host-docker" && (hostDockerEnvPath === null || !hostDockerEnvPath.startsWith("/"))) {
|
||||
throw new Error(`${sub2apiConfigPath}.targets[${id}].hostDocker.envPath must be an absolute path when runtimeMode=host-docker`);
|
||||
}
|
||||
|
||||
return {
|
||||
id,
|
||||
@@ -114,6 +121,9 @@ export function codexPoolRuntimeTarget(targetId?: string): CodexPoolRuntimeTarge
|
||||
publicExposure,
|
||||
appSecretName: runtimeConfig.appSecretName,
|
||||
secretsRoot: runtimeConfig.secretsRoot,
|
||||
appSourceRef: runtimeConfig.appSourceRef,
|
||||
hostDockerAppPort,
|
||||
hostDockerEnvPath,
|
||||
sentinelEnabled,
|
||||
sentinelImageBuild,
|
||||
egressProxy,
|
||||
|
||||
@@ -88,6 +88,9 @@ export interface CodexPoolRuntimeTarget {
|
||||
publicExposure: CodexPoolRuntimePublicExposure | null;
|
||||
appSecretName: string;
|
||||
secretsRoot: string;
|
||||
appSourceRef: string;
|
||||
hostDockerAppPort: number | null;
|
||||
hostDockerEnvPath: string | null;
|
||||
sentinelEnabled: boolean;
|
||||
sentinelImageBuild: {
|
||||
baseImageCachePolicy: "pull" | "local-if-present";
|
||||
|
||||
Reference in New Issue
Block a user