feat: activate D601 sub2api external edge

This commit is contained in:
Codex
2026-06-12 07:56:46 +00:00
parent f5a6610fe9
commit c23d9b5ea0
8 changed files with 2375 additions and 164 deletions
+18 -6
View File
@@ -5,7 +5,7 @@ description: UniDesk Sub2API 平台运维技能。用户提到 Sub2API、sub2api
# UniDesk Sub2API
UniDesk 在 k3s `platform-infra` namespace 运维 Sub2API。G14 是默认 active runtimeD601 只作为同一 YAML/CLI 控制下的 standby predeploy target,外置 DB 未就绪时应用和本地 Redis cache 都保持 replicas=0。日常操作统一使用 UniDesk CLI,不直接写 Kubernetes 资源或手工调用 Sub2API 管理 API。
UniDesk 在 k3s `platform-infra` namespace 运维 Sub2API。G14 是默认 active runtimeD601 同一 YAML/CLI 控制,可保持 standby predeploy,也可在外置 DB、镜像、FRP 和 egress proxy 条件就绪后作为 external-active target 运行。日常操作统一使用 UniDesk CLI,不直接写 Kubernetes 资源或手工调用 Sub2API 管理 API。
**固定入口**: `cd /root/unidesk && bun scripts/cli.ts platform-infra sub2api ...`
@@ -33,8 +33,8 @@ bun scripts/cli.ts platform-infra sub2api codex-pool trace --request-id <request
- 配置真相是 YAML`config/platform-infra/sub2api.yaml``config/platform-infra/sub2api-codex-pool.yaml`
- 业务策略和具体数值只以 YAML 为准。已有字段的数值调整只改 YAML 并跑 `plan` / `sync --confirm` / `validate`;不要自动补代码硬编码、schema 硬范围、合同测试、单元测试或长期参考文档。配置校验只校验格式、类型、必填和可渲染性,不判断数值策略是否“合理”。
- 本 skill 目录下若存在 `agents/*.yaml`,只作为 skill/agent 展示与调用元数据,不是 Sub2API 或 Codex pool 运行配置;不要在 skill 目录维护第二份账号、capacity、priority、endpoint 或 Secret 配置。
- Runtime target 由 `config/platform-infra/sub2api.yaml` 声明;默认 `G14:k3s` 是 active target`D601:k3s` standby predeploy target。master server 只是控制端和消费者,不部署 Sub2API/PostgreSQL/Redis。
- D601 standby 不部署本地 PostgreSQL,不运行第二套 sentinel 或 FRP 管理入口;外置 DB endpoint/Secret 未准备好时只能预部署 namespace、NetworkPolicy、Service,以及 replicas=0 的 Sub2API/Redis Deployment。Redis 激活后也只允许 ephemeral cache。
- Runtime target 由 `config/platform-infra/sub2api.yaml` 声明;默认 `G14:k3s` 是 active target`D601:k3s` 可由 YAML 声明为 standby predeploy 或 external-active target。master server 只是控制端和消费者,不部署 Sub2API/PostgreSQL/Redis。
- D601 standby 不部署本地 PostgreSQL,不运行 sentinel 或 FRP 管理入口;外置 DB endpoint/Secret 未准备好时只能预部署 namespace、NetworkPolicy、Service,以及 replicas=0 的 Sub2API/Redis Deployment。Redis 激活后也只允许 ephemeral cache。D601 external-active 仍不部署本地 PostgreSQL,必须直连 YAML 声明的外置 DB,使用本地 ephemeral Redis,并且只有在 YAML 启用时才运行 frpc、egress proxy 和目标级 sentinel。
- Secret、`~/.codex/config.toml*``~/.codex/auth.json*` 是运行时输入或本地状态,不提交。
- 默认 `~/.codex/config.toml``~/.codex/auth.json` 只作为统一 Sub2API consumer 使用;`config.toml` 必须指向 `https://sub2api.74-48-78-17.nip.io/``auth.json` 必须使用统一 pool API key。新增上游账号不得覆盖这两个默认文件,只能新增 `config.toml.<profile>` / `auth.json.<profile>` 并在 YAML 里声明。
- 输出只能包含 Secret 路径、长度、preview/fingerprint;禁止打印完整 API key、admin password、JWT secret、TOTP key。
@@ -57,7 +57,7 @@ bun scripts/cli.ts platform-infra sub2api validate --target D601
- `plan` 读取 `config/platform-infra/sub2api.yaml`,渲染 `src/components/platform-infra/sub2api/sub2api.k8s.yaml`,检查 no Ingress/NodePort/LoadBalancer/hostPort/hostNetwork/resource limits,并要求 `NetworkPolicy/allow-all` 随 manifest 受控创建。
- `apply --confirm` 默认创建异步 job;按返回的 `job status` 命令轮询,再跑 `status``validate`
- `status --full|--raw` 只在需要展开远端 stdout/stderr 或原始 JSON 时使用。
- `validate` 是按需验收,不是连续可用性探针。对 D601 standby`validate --target D601` 验证预部署形态,不要求外置 DB 当前可连接。
- `validate` 是按需验收,不是连续可用性探针。对 D601 standby`validate --target D601` 验证预部署形态,不要求外置 DB 当前可连接;对 D601 external-active,必须验证外置 DB、ephemeral Redis、Sub2API service、YAML egress proxy 和目标级 public exposure
## 镜像升级
@@ -73,14 +73,22 @@ bun scripts/cli.ts platform-infra sub2api validate --target D601
```bash
bun scripts/cli.ts platform-infra sub2api codex-pool plan
bun scripts/cli.ts platform-infra sub2api codex-pool plan --target D601
bun scripts/cli.ts platform-infra sub2api codex-pool sync --confirm
bun scripts/cli.ts platform-infra sub2api codex-pool sync --target D601 --confirm
bun scripts/cli.ts platform-infra sub2api codex-pool validate
bun scripts/cli.ts platform-infra sub2api codex-pool validate --target D601
bun scripts/cli.ts platform-infra sub2api codex-pool trace --request-id <requestId>
bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image status
bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image status --target D601
bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image build --confirm
bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image build --target D601 --confirm
bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-probe --account unidesk-codex-hy --confirm
bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-probe --target D601 --account unidesk-codex-hy --confirm
bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-report
bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-report --target D601
bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --confirm
bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --target D601 --confirm
```
`config/platform-infra/sub2api-codex-pool.yaml` 控制:
@@ -113,7 +121,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --confirm
`sync --confirm` 会登录 Sub2API admin、创建/更新 group、创建/更新 YAML 中的 `unidesk-codex-*` accounts、创建/复用统一 API key Secret,并把未处于哨兵 active quarantine 的 managed account 的 `schedulable=true` 恢复为过程控制基线;它默认不删除 YAML 中缺席的 managed account。只有明确退役上游时才使用 `sync --confirm --prune-removed` 删除缺席且 `extra.unidesk_managed=true``unidesk-codex-*` account。
`sentinel-image status|build` 管理哨兵 Python 运行环境镜像。镜像由 YAML 的 `sentinel.image` 基础镜像和 `sentinel.sdk.openaiPythonVersion` 派生,发布到 G14 本地 registry `127.0.0.1:5000/platform-infra/sub2api-account-sentinel:<derived-tag>``build --confirm` 会先检查 registry tag,存在则快速复用,不存在才在 G14 host 构建并 push。CronJob 启动时只校验 SDK 版本,不在运行时 `pip install`
`sentinel-image status|build` 管理哨兵 Python 运行环境镜像。镜像由 YAML 的 `sentinel.image` 基础镜像和 `sentinel.sdk.openaiPythonVersion` 派生,发布到目标 runtime 的本地 registry`build --confirm` 会先检查 registry tag,存在则快速复用,不存在才在目标 host 构建并 push。CronJob 启动时只校验 SDK 版本,不在运行时 `pip install`
`sync --confirm` 同时会按 YAML 渲染账号级哨兵资源,并在 monitor 开启时先确保可复用哨兵镜像存在。当前目标是 `sentinel.monitor.enabled=true` + `sentinel.actions.enabled=true` 的 marker-only 自动冻结/恢复;不要手工 patch CronJob、Secret 或 Sub2API account。若 YAML 新增账号或修改 profile/base URL/API key fingerprint/upstream User-Agent/Responses WebSocket modesync 会从变更前 runtime state 写入 pending probe 记录并立即安排 sentinel probe,但默认仍保持该 account 可调度;只有实际 marker probe 非命中或已有 active quarantine 才会冻结账号。sentinel 冻结/恢复只改 `schedulable=false|true`,不得顺手调用 Sub2API `recover-state` 清除请求路径临时不可调度或其他 runtime backoff。无关账号的既有成功/失败退避不能被重置。若 YAML 下调失败冻结最大窗口,sync 会把仍 active 的旧冻结状态迁移到当前最大窗口内并立即安排 recovery probe,但不会直接解冻。若怀疑某个账号被误判,先用 `codex-pool sentinel-probe --account <accountName> --confirm` 立即触发该账号测量;该命令从现有 CronJob 模板派生一次性 Job,复用同一份 Secret、ConfigMap、OpenAI SDK probe、token/cost 账本和冻结/恢复状态机。
@@ -160,13 +168,14 @@ bun scripts/cli.ts platform-infra sub2api codex-pool expose
bun scripts/cli.ts platform-infra sub2api codex-pool expose --confirm
```
-`publicExposure` YAML 控制。默认公共端是 `publicBaseUrl`master 本地消费端是 `masterBaseUrl`
- YAML `publicExposure` 控制。Codex pool 默认公共端是 `publicBaseUrl`master 本地消费端是 `masterBaseUrl`D601 external-active 的目标级 public exposure 在 `config/platform-infra/sub2api.yaml` 中声明
- `expose --confirm` 只为 YAML 指定的 `remotePort` 补 master `frps` allow port,并在 G14 创建/更新 `sub2api-frpc`
- master Caddy site 也由 `publicExposure.masterCaddy` 渲染;`responseHeaderTimeoutSeconds` 必须足够覆盖 Codex `/responses/compact` 长请求,避免 Caddy 先返回 504 而 Sub2API 后台实际稍后成功。具体数值只改 `config/platform-infra/sub2api-codex-pool.yaml`,修改后跑 `codex-pool expose --confirm`,再核对 Caddyfile 中渲染出的 `response_header_timeout`
- master Caddy 的短窗口边缘重试由 `publicExposure.masterCaddy.edgeRetry` 渲染;用于吸收 FRP remotePort 短暂关闭、`connect: connection refused`、EOF 或 connection reset 这类请求尚未稳定到达 Sub2API 的 502。具体 retry 时长、间隔和 `retryMatch` 范围只写 YAML,修改后跑 `codex-pool expose --confirm`,再核对 Caddyfile 中渲染出的 `lb_try_duration``lb_try_interval``lb_retry_match`。不要手工 patch `/etc/caddy/Caddyfile`
- 非幂等 POST 的 round-trip retry 必须收窄到 YAML `retryMatch` 声明的安全路径;普通 `/responses` 上游账号错误仍归 Sub2API failover / temp-unschedulable / sentinel 处理,不用 Caddy 重放整段推理请求来掩盖账号池问题。
- 同一个 FRP TCP 入口同时暴露 OpenAI-compatible API 和 Sub2API 管理 UI `/login`。不要另开第二个管理端口,除非 YAML 明确声明新的暴露决策。
- Sub2API Kubernetes Service 继续保持 ClusterIP。
- D601 external-active 的公开路径是 `client -> PK01 Caddy -> PK01 frps remotePort -> D601 frpc -> Sub2API`,不经过 pikanode,也不经过 master server 反代。PK01 Caddy 下载必须使用 YAML `publicExposure.pk01.caddyDownloadProxyUrl` 指定的 proxy;如果 Caddy 下载慢,先确认 apply 输出里是 `downloadProxy.mode=curl-proxy``api.pikapython.com` 必须先解析到 YAML 声明的 PK01 公网地址,HTTPS 才能作为最终验证。
## 配置 master Codex 消费端
@@ -193,6 +202,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm
- `sub2api validate`app、PostgreSQL、Redis、service proxy、`NetworkPolicy/allow-all` 和临时跨 Pod PostgreSQL/Redis 连通性检查通过。
- `codex-pool validate`:统一 key 的 `GET /v1/models` 成功,并用 `localCodex.responsesSmokeModel` 跑一次小的 `POST /v1/responses` smokeowner balance / owner concurrency 已满足 YAML 最小值,capacity、WebSocket v2、Sub2API 内置 temporary-unschedulable 开关/规则和 sentinel runtime 状态与 YAML 对齐;`validation.gatewayResponsesRecent` 汇总最近 6 小时普通 `/responses``/v1/responses` 的 failover、forward failure、最终 4xx/5xx、慢 final error 与 `context canceled` 证据,`validation.gatewayCompactRecent` 单独汇总 `/responses/compact` 证据。若当前 Responses smoke `ok=true` 但 recent 字段 `degraded=true`,先区分是历史窗口残留还是新的 request id 正在失败;长期判定见 `docs/reference/platform-infra.md`
-`publicExposure.enabled=true`,确认 FRP path 可用;`expose --confirm` 会用未带 key 的 public `/v1/models` 401 作为网关可达性探针。
- 若目标声明了 `egressProxy.enabled=true`,确认 proxy Deployment/Service readySub2API 和 sentinel env 与 YAML 对齐,并通过 YAML 声明的 health URL 完成代理出站探针。
如果要证明真实模型请求可用,使用最小 `/v1/responses` 或等价 Codex smoke。不要把 group-level `/v1/models` 成功解释成每个上游 account 都健康。
@@ -205,6 +215,8 @@ bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm
- pool key 401:跑 `codex-pool sync --confirm` 重建 Sub2API key 与 k3s Secret 绑定,再跑 `codex-pool validate`
- 运行中过去的验证探针残留:只用 `codex-pool cleanup-probes --confirm` 清理 `unidesk-probe-*` 临时资源;不要把真实 managed account 删除当作探针清理或可用性恢复。
- FRP 不通:先看 `codex-pool expose --confirm` 输出的 `masterFrps``masterCaddy``sub2api-frpc` 和 public 401 probe;需要低层证据时只用 `trans G14:k3s` 做 bounded 查询。
- D601 external-active 的 `api.pikapython.com` 不通:先区分 DNS/TLS/Caddy/FRP/Sub2API。DNS 未解析到 YAML 声明的 PK01 地址时,Caddy ACME 会失败,`https://api.pikapython.com` 不能算完成;可用 PK01 loopback FRP 端口和 PK01 公网 remotePort 证明 D601 FRP 数据路径,但最终仍要等 DNS 生效后重跑 HTTPS health、`/v1/models``/v1/responses`
- Caddy 下载慢或失败:先确认 `config/platform-infra/sub2api.yaml` 已设置 `publicExposure.pk01.caddyDownloadProxyUrl`,并重跑 `sub2api apply --target D601 --confirm` 看 PK01 apply summary 中的 `downloadProxy.mode=curl-proxy`。不要反复裸连 GitHub release。
- `/responses/compact` 在接近 master Caddy `response_header_timeout` 的固定时长后返回 504,或 Sub2API 日志稍后记录 `codex.remote_compact.succeeded` 时,优先检查 master Caddy `response_header_timeout` 是否由 YAML `publicExposure.masterCaddy.responseHeaderTimeoutSeconds` 渲染,修正后跑 `codex-pool expose --confirm`;这类边缘代理超时不会触发 Sub2API 账号级临时下线。reload 前已经在途的 compact 请求仍可能按旧 timeout 结束,判断修复是否生效时只看 reload 之后新发起的请求。
- `/responses/compact` 或普通 public URL 在几秒窗口内出现 502Caddy 日志显示 `dial tcp 127.0.0.1:<remotePort>: connect: connection refused``EOF``connection reset by peer`,同时 frps 日志出现 `platform-infra-sub2api proxy closing` / `listener is closed` / `new proxy ... success`,说明失败在 master Caddy 与 FRP remotePort 边缘层,Sub2API 和 sentinel 可能完全看不到。先确认 `publicExposure.masterCaddy.edgeRetry` 已按 YAML 渲染并 `codex-pool expose --confirm` 生效;若仍频繁发生,再继续查 G14 `sub2api-frpc` 到 master `frps` 的控制连接稳定性。不要把这类边缘 502 误判成账号池上游错误,也不要通过禁用账号恢复。
- default profile 递归:检查 YAML default entry 是否使用 `*.pre-sub2api` 备份文件;必要时恢复备份后重新 `configure-local --confirm`
+16 -1
View File
@@ -110,20 +110,35 @@ postgres:
user: sub2api
address: 10.0.8.0/22
method: scram-sha-256
- type: hostssl
database: postgres
user: sub2api
address: 10.0.8.0/22
method: scram-sha-256
- type: hostssl
database: sub2api
user: sub2api
address: 74.48.78.17/32
method: scram-sha-256
- type: hostssl
database: postgres
user: sub2api
address: 74.48.78.17/32
method: scram-sha-256
- type: hostssl
database: sub2api
user: sub2api
address: 36.49.29.73/32
method: scram-sha-256
- type: hostssl
database: postgres
user: sub2api
address: 36.49.29.73/32
method: scram-sha-256
secrets:
source: master-local
root: .state/secrets
root: /root/unidesk/.state/secrets
entries:
- name: sub2api-db-credentials
sourceRef: platform-db/sub2api-db.env
+84 -7
View File
@@ -15,24 +15,101 @@ targets:
- id: D601
route: D601:k3s
namespace: platform-infra
role: standby
role: active-standby
enabled: true
databaseMode: external-pending
databaseMode: external-active
redisMode: local-ephemeral
appReplicas: 0
redisReplicas: 0
appReplicas: 1
redisReplicas: 1
image:
repository: 127.0.0.1:5000/platform-infra/sub2api
tag: 0.1.136
pullPolicy: IfNotPresent
dependencyImages:
postgres: docker.m.daocloud.io/library/postgres:18-alpine
redis: docker.m.daocloud.io/library/redis:8-alpine
publicExposure:
enabled: true
publicBaseUrl: https://api.pikapython.com
dns:
hostname: api.pikapython.com
expectedA: 82.156.23.220
resolvers: [1.1.1.1, 8.8.8.8, 223.5.5.5, 114.114.114.114]
frpc:
deploymentName: sub2api-frpc
secretName: sub2api-frpc-secrets
secretKey: frpc.toml
image: 127.0.0.1:5000/hwlab/frpc:v0.68.1
serverAddr: 82.156.23.220
serverPort: 22000
proxyName: platform-infra-sub2api-d601-api
remotePort: 22098
localIP: sub2api.platform-infra.svc.cluster.local
localPort: 8080
tokenSourceRef: platform-infra/pk01-frp.env
tokenSourceKey: FRP_TOKEN
pk01:
route: PK01
caddyBinaryPath: /usr/local/bin/caddy
caddyDownloadUrl: https://caddyserver.com/api/download?os=linux&arch=amd64
caddyDownloadProxyUrl: http://127.0.0.1:18789
caddyConfigPath: /etc/caddy/Caddyfile
caddyServiceName: caddy
caddyStorageDir: /var/lib/caddy
caddyEmail: ops@pikapython.com
pikanodeRoot: /home/ubuntu/pikanode
pikanodeContainerName: pikanode
pikanodeImage: pikanode
pikanodeHttpHostPort: 18888
responseHeaderTimeoutSeconds: 600
egressProxy:
enabled: true
deploymentName: sub2api-egress-proxy
serviceName: sub2api-egress-proxy
secretName: sub2api-egress-proxy-config
secretKey: config.json
image: 127.0.0.1:5000/platform-infra/sing-box:latest
imagePullPolicy: IfNotPresent
listenPort: 10808
sourceRef: platform-infra/master-vpn-subscription.env
sourceKey: MASTER_VPN_SUBSCRIPTION_URL
sourceType: subscription-url
preferredOutbound: vless-reality
applyToSub2Api: true
applyToSentinel: true
healthProbeUrl: https://www.gstatic.com/generate_204
noProxy:
- localhost
- 127.0.0.1
- ::1
- .svc
- .cluster.local
- 10.0.0.0/8
- 172.16.0.0/12
- 192.168.0.0/16
- 82.156.23.220
- 74.48.78.17
- hyueapi.com
- .hyueapi.com
runtime:
database:
mode: external
sourceRef: platform-db/postgres-active.env
sourceRef: platform-db/sub2api-db.env
sourceKeys:
user: SUB2API_DB_USER
password: SUB2API_DB_PASSWORD
dbName: SUB2API_DB_NAME
secretName: sub2api-secrets
passwordKey: POSTGRES_PASSWORD
host: pika01-postgres.pending.local
host: 82.156.23.220
port: 5432
user: sub2api
dbName: sub2api
sslMode: prefer
sslMode: require
pendingAllowed: true
secrets:
root: /root/unidesk/.state/secrets
appSourceRef: platform-infra/sub2api.env
redis:
serviceName: sub2api-redis
persistence: false
+10
View File
@@ -58,6 +58,16 @@ PK01 currently hosts existing Docker workloads:
`pikanode` mounts `/home/ubuntu/pikanode` read-write into the container. Static/generated download artifacts under `html/download/` and repository data under `files/` may be user-visible or needed by the service. They are not generic GC candidates.
## Sub2API Caddy Edge
PK01 may act as the public Caddy edge for a YAML-declared D601 Sub2API target. The durable source of truth is `config/platform-infra/sub2api.yaml`; do not hand-edit PK01 Caddy or FRP state as a separate routing truth.
In this mode, public ports `80` and `443` belong to Caddy. The existing `pikanode` container must be bound to a loopback HTTP port and used only as the apex PikaPython/PikaNode upstream. The `api.pikapython.com` site must reverse proxy directly to the YAML-declared FRP remote port, so API traffic follows `client -> PK01 Caddy -> PK01 frps remote port -> D601 frpc -> D601 Sub2API`. It must not pass through pikanode or a master-server reverse proxy.
Caddy binary installation is also YAML-controlled. If `publicExposure.pk01.caddyDownloadProxyUrl` is set, PK01 Caddy downloads must use that proxy URL; the PK01 loopback provider egress proxy is the preferred source. A slow or failing Caddy download should first be treated as missing proxy use, not as a reason to keep retrying a naked GitHub release download.
The public certificate depends on DNS. The `api.pikapython.com` record must resolve to the YAML-declared PK01 public address before Caddy can complete ACME issuance. If the DNS record is absent or stale, local probes such as PK01 `127.0.0.1:<frp-remote-port>` and public-IP remote-port checks can prove the FRP data path, but final `https://api.pikapython.com` validation remains blocked until DNS is corrected.
## Host PostgreSQL
PK01 host-native PostgreSQL is declared by `config/platform-db/postgres-pk01.yaml` and managed through `bun scripts/cli.ts platform-db postgres plan|status|apply`; daily operation commands live in `$unidesk-ops` at `.agents/skills/unidesk-ops/SKILL.md`. It is a host systemd service, not a Docker container or k3s workload. The YAML is the source of truth for PostgreSQL version, TLS mode, listening addresses, `pg_hba` source CIDRs, generated Secret source files, exported `DATABASE_URL`, and backup timer settings.
+16 -10
View File
@@ -1,6 +1,6 @@
# Platform Infra
`platform-infra` is the k3s namespace for UniDesk-operated shared platform services. G14 is the active default runtime for this namespace; D601 may host explicitly declared standby platform targets when the service needs node-local preparation or cutover capacity. It is separate from HWLAB runtime lanes, AgentRun lanes, D601 user services, and legacy `devops-infra` control-plane helpers. New shared infra should land here first; old `devops-infra` resources migrate gradually only when a concrete owner and validation path exist.
`platform-infra` is the k3s namespace for UniDesk-operated shared platform services. G14 is the default active runtime for this namespace; D601 may host explicitly declared standby or externally backed active targets when the service needs node-local preparation, cutover capacity, or a direct public edge. It is separate from HWLAB runtime lanes, AgentRun lanes, D601 user services, and legacy `devops-infra` control-plane helpers. New shared infra should land here first; old `devops-infra` resources migrate gradually only when a concrete owner and validation path exists.
## Source Of Truth
@@ -12,15 +12,15 @@
## Sub2API Deployment Boundary
- Sub2API is a platform service operated by UniDesk in namespace `platform-infra`. It is not a HWLAB lane workload, AgentRun workload, D601 user service, or master server daemon.
- The canonical deployment entrypoint is `bun scripts/cli.ts platform-infra sub2api plan|apply|status|validate|codex-pool`. Runtime targets are selected with `--target`; `G14` is the active default target and `D601` is a standby target controlled by the same YAML. Daily operation procedures live in `$unidesk-sub2api` at `.agents/skills/unidesk-sub2api/SKILL.md`. This reference keeps only development boundaries and project-specific source-of-truth rules.
- The canonical deployment entrypoint is `bun scripts/cli.ts platform-infra sub2api plan|apply|status|validate|codex-pool`. Runtime targets are selected with `--target`; `G14` is the default active target and `D601` is controlled by the same YAML as either standby predeploy or externally backed active runtime. Daily operation procedures live in `$unidesk-sub2api` at `.agents/skills/unidesk-sub2api/SKILL.md`. This reference keeps only development boundaries and project-specific source-of-truth rules.
- Raw `kubectl` through `trans <target>:k3s` is only for bounded diagnosis and evidence, not a formal mutate path.
- The image version is controlled by `config/platform-infra/sub2api.yaml`. Image update procedures are daily operations owned by `$unidesk-sub2api`; the development boundary is that image choices remain YAML-controlled.
- Sub2API should stay ClusterIP-only by default. Do not add Ingress, NodePort, LoadBalancer, or broad FRP exposure unless a YAML-controlled public exposure decision exists.
- Sub2API currently has no resource limits by design. Do not add CPU or memory limits unless a later explicit decision changes that policy and stores the new policy in YAML.
- Master server is a consumer/control host, not the runtime location. Do not deploy Sub2API, PostgreSQL, Redis, or heavy validation loops on master server.
- D601 Sub2API is a predeployment target, not a second active singleton. While the platform database handoff is pending, it must render without a local PostgreSQL StatefulSet, keep the Sub2API app and local Redis cache scaled to zero, and use only ephemeral Redis storage when Redis is later activated. After the external platform DB endpoint, Secret, and runtime images are ready, activation must be expressed by YAML and applied through the same `platform-infra sub2api --target D601` CLI path.
- D601 Sub2API is selected by YAML, not by ad hoc runtime patches. In standby mode it must render without a local PostgreSQL StatefulSet, keep the Sub2API app and local Redis cache scaled to zero, and use only ephemeral Redis storage when Redis is later activated. In externally backed active mode it connects directly to the YAML-declared external PostgreSQL endpoint with `sslmode=require`, keeps durable app state outside the D601 k3s node, and uses local Redis only as ephemeral cache. Activation must be applied through the same `platform-infra sub2api --target D601` CLI path.
- External platform PostgreSQL endpoints for Sub2API are produced by the platform DB YAML and its `platform-db postgres` CLI. Cross-node Sub2API consumers connect directly to that endpoint; the master server is not a PostgreSQL data-plane relay. DNS aliases are optional when the exported `DATABASE_URL` uses a reachable IP with `sslmode=require`; current PK01-specific rules live in `docs/reference/pk01.md`.
- Sub2API account sentinel and public FRP exposure remain singleton concerns. Do not create a second sentinel or public management surface for D601 unless a later YAML-controlled decision explicitly moves or splits that responsibility.
- Sub2API account sentinel and public exposure are target-scoped YAML decisions. Do not create a second sentinel, FRP client, public management surface, or edge proxy by hand; enable or move those resources only through the target YAML and the `platform-infra sub2api` / `codex-pool --target` CLI paths.
## Codex Pool Routing
@@ -45,7 +45,7 @@
- Codex account-state, quota prompts, model-routing failures, encrypted-content affinity failures, gateway wrappers, and timeout-like upstream errors must be handled by the generic temporary-unschedulable/failover path plus the external marker sentinel. Do not change membership, priority, capacity, load factor, WebSocket mode, `pool_mode`, or a specific provider's status merely to work around those errors. If a matching upstream failure still logs `openai.forward_failed` without `openai.upstream_failover_switching`, the missing fix is in Sub2API's HTTP `/responses` failover classification/error propagation, not in account pinning.
- `profiles.entries[].openaiResponsesWebSocketsV2Mode` is the account-level Responses WebSocket v2 switch for OpenAI-compatible upstreams that require WebSocket transport. Allowed values are `off`, `ctx_pool`, and `passthrough`; omit the field unless that upstream needs it.
- `profiles.entries[].upstreamUserAgent` is an optional account-level upstream request User-Agent override. Use it only for upstreams that require a Codex CLI compatible User-Agent; keep the value YAML-controlled and newline-free.
- `publicExposure` controls the optional FRP bridge from master server to the G14 ClusterIP service.
- `publicExposure` in `config/platform-infra/sub2api-codex-pool.yaml` controls the default Codex-pool public bridge from master server to the G14 ClusterIP service. Target-level `publicExposure` in `config/platform-infra/sub2api.yaml` controls non-master exposure such as a D601-to-PK01 edge.
- `publicExposure.masterCaddy.responseHeaderTimeoutSeconds` controls the master Caddy `response_header_timeout` for the public Sub2API site. It must be long enough for Codex `/responses/compact` requests; otherwise Caddy can return a client-visible 504 before Sub2API finishes the upstream compact request, and that edge timeout is not an account-level upstream failure that Sub2API can use for temporary-unschedulable failover. The numeric value belongs only in `config/platform-infra/sub2api-codex-pool.yaml`; after changing it, use `codex-pool expose --confirm` to reload Caddy and verify the rendered `response_header_timeout`. Requests that were already in flight before the reload may still finish with the previous timeout, so post-change evidence should check only requests that started after the reload.
- `publicExposure.masterCaddy.edgeRetry` controls the master Caddy reverse-proxy retry window for the public Sub2API site. This belongs at the edge because FRP remotePort listener loss, `connection refused`, EOF, or connection reset can happen before a request reaches Sub2API, so Sub2API account failover and sentinel logic cannot observe or recover that request. Keep retry scope narrow, especially for non-idempotent POST traffic: connection-attempt failures may be retried by the reverse proxy, while round-trip retry after an upstream connection was established should be limited by YAML `retryMatch` to paths that are safe to repeat, such as compact. Retry durations and intervals belong only in YAML; after changing them, run `codex-pool expose --confirm` and verify the rendered Caddyfile contains the expected `lb_try_duration`, `lb_try_interval`, and `lb_retry_match`.
- `localCodex` controls how the master server's current `~/.codex` consumer files are backed up and rewritten. Keep `supportsWebSockets` and `responsesWebSocketsV2` in the same state, and enable them only when at least one YAML-managed account has a current direct Codex WSv2 smoke that passes. If no upstream profile can sustain Responses WSv2, the honest long-term state is `false/false` so Codex uses HTTP Responses directly instead of repeatedly reconnecting before `response.completed`. `localCodex.responsesSmokeModel` is the YAML-declared model used by `codex-pool validate` for the lightweight `POST /v1/responses` smoke.
@@ -92,7 +92,7 @@ If the YAML success cadence maximum is lowered or an account changes trust class
Operational observation for this sentinel should use the read-only `codex-pool sentinel-report` table or its `--raw` form. It is the canonical low-noise view for per-account probe count, trust class, marker result, HTTP/error diagnostics, freeze TTL, success cadence, success cadence maximum, next probe time, and recent CronJob runs; raw ConfigMap dumps and ad hoc log scraping are fallback diagnostics, not the primary state surface.
The request path is:
The default G14 Codex-pool request path is:
1. A client sends an OpenAI-compatible request to the configured consumer base URL, normally `https://sub2api.74-48-78-17.nip.io/v1/...`, with the unified API key.
2. master `frps` forwards the TCP connection to `platform-infra/sub2api-frpc` when `publicExposure.enabled` is true.
@@ -100,6 +100,10 @@ The request path is:
4. Sub2API validates the unified key and resolves its `group_id`.
5. Accounts listed in `profiles.entries` are bound to the same group via `group_ids`, so Sub2API dispatches through that group using its own account selection semantics.
The D601 externally backed request path is different when target-level `publicExposure.enabled=true` in `config/platform-infra/sub2api.yaml`: client traffic reaches PK01 Caddy, PK01 forwards to the YAML-declared FRP remote port, D601 `sub2api-frpc` connects directly to PK01 `frps`, and FRP forwards to `sub2api.platform-infra.svc.cluster.local:8080` on D601. This path does not pass through the master server or the pikanode reverse proxy. `api.pikapython.com` must resolve to the YAML-declared PK01 public address before Caddy can obtain or renew the public certificate; when DNS is missing, PK01 local FRP probes and public-IP remote-port probes may prove the edge path, but they are not a substitute for final `https://api.pikapython.com` validation.
When target-level `egressProxy.enabled=true`, the D601 target renders an in-cluster HTTP(S) proxy client from the master VPN subscription source declared in YAML. The CLI injects the resulting proxy URL and `NO_PROXY` into Sub2API and, when requested by YAML, the Codex account sentinel. `platform-infra sub2api validate --target D601 --full` must prove the proxy Deployment/Service is ready and that an app pod can complete the YAML-declared health probe through the proxy. Subscription contents and generated proxy configs are Secret material and must not be printed.
Adding, removing, exposing, validating, and configuring local Codex consumers are daily operations covered by `$unidesk-sub2api`. The development rule is that ordinary pool membership changes stay YAML-only and do not add code or CI/CD. Code changes are only appropriate when UniDesk needs to render or validate a Sub2API capability that already exists upstream, such as account-level WebSocket mode or per-account upstream User-Agent. If Sub2API itself does not support a desired behavior, do not magic-patch it through UniDesk scripts, Kubernetes hotfixes, local forks, or hidden compatibility paths; either leave the behavior unsupported or pursue it upstream as an explicit Sub2API feature.
`codex-pool sync --confirm` and `codex-pool validate` are runtime operations that may need more than one SSH short-connection window because they log in to Sub2API, reconcile accounts, inspect recent logs, and run gateway smoke requests. The formal entry remains the UniDesk CLI, which must use a submit-and-short-poll control shape or an equivalent remote job wrapper instead of one long `trans G14:k3s script` call. If these commands fail with `UNIDESK_SSH_RUNTIME_TIMEOUT` while the remote operation may still be running, treat it as a control-plane visibility gap first: improve or use the CLI's job/poll path, then rerun `sync` or `validate`. Do not replace it with raw `kubectl`, manual Sub2API admin API patches, repeated blind full loops, or Sub2API source modifications.
@@ -112,18 +116,18 @@ When `publicExposure.enabled` is true, the same FRP TCP bridge exposes both Open
The public management UI is an operations endpoint. Keep Sub2API itself in `platform-infra`, keep the Kubernetes Service as ClusterIP, and treat FRP as the only public bridge unless a later decision explicitly changes the exposure model.
The public bridge has two separate failure classes. Sub2API upstream/account failures are visible in Sub2API logs and currently belong to sentinel quarantine plus normal Sub2API routing among schedulable accounts. Edge failures between master Caddy and the FRP remotePort are not visible to Sub2API; symptoms include Caddy `connect: connection refused`, EOF, connection reset, or short 502 bursts while frps closes and reopens the configured remotePort. Those failures must be diagnosed from Caddy and frps/frpc evidence and mitigated through YAML-controlled Caddy edge retry or FRP stability fixes, not by disabling accounts or changing pool membership.
The public bridge has two separate failure classes. Sub2API upstream/account failures are visible in Sub2API logs and currently belong to sentinel quarantine plus normal Sub2API routing among schedulable accounts. Edge failures between Caddy and the FRP remote port are not visible to Sub2API; symptoms include Caddy `connect: connection refused`, EOF, connection reset, TLS/certificate failures, DNS NXDOMAIN, or short 502 bursts while frps closes and reopens the configured remote port. Those failures must be diagnosed from DNS, Caddy, and frps/frpc evidence and mitigated through YAML-controlled Caddy edge retry, DNS correction, or FRP stability fixes, not by disabling accounts or changing pool membership.
## Availability And Probes
Kubernetes readiness is not the same as pool availability:
- The Sub2API app, PostgreSQL, and Redis manifests include container-level health probes. These only prove the pods and local dependencies are healthy enough for Kubernetes scheduling.
- The FRP client deployment is currently a simple connector deployment and does not itself prove that master-local traffic reaches Sub2API.
- The FRP client deployment is a connector deployment and does not itself prove that edge traffic reaches Sub2API.
- No scheduled `CronJob`, `ServiceMonitor`, or `PodMonitor` currently proves the full unified Codex API path.
- `platform-infra sub2api validate` and `platform-infra sub2api codex-pool validate` are on-demand checks. Operational usage is documented in `$unidesk-sub2api`; they are acceptable for deployment closeout, but they are not continuous monitoring. `codex-pool validate` must test both `GET /v1/models` and a small `POST /v1/responses` request, and the Responses smoke should report request id, selected/final account evidence, upstream failover count, and whether the validation succeeded only after failover. It should also summarize recent `/responses` and `/responses/compact` gateway failures separately so ordinary long streaming failures are not hidden behind compact-only evidence.
- `codex-pool validate` must not create mock upstreams or temporary failover-probe accounts as its default proof of Sub2API behavior. When a suspected failover path is in question, validate should surface the relevant source-path expectation and real runtime evidence: request ids, selected/final account ids, `openai.upstream_failover_switching`, `openai.forward_failed`, `openai.account_select_failed`, and final status. If runtime evidence contradicts the source-path expectation, fix Sub2API or the UniDesk integration path rather than converting the mismatch into a mock-only success.
- Public exposure closeout must include the edge layer when the user-facing URL is involved. A Sub2API-side compact success summary does not rule out Caddy/FRP 502s that happened before Sub2API received the request; inspect the edge Caddy/frps/frpc evidence or use a CLI report that summarizes it before declaring public compact stable.
- Public exposure closeout must include the edge layer when the user-facing URL is involved. A Sub2API-side compact success summary does not rule out DNS, Caddy, TLS, or FRP failures that happened before Sub2API received the request; inspect the edge evidence or use a CLI report that summarizes it before declaring the public URL stable.
- Because `codex-pool validate` includes account alignment, recent-log inspection, and gateway smoke, timeout of the CLI transport is not valid negative evidence about Sub2API scheduling by itself. Closeout evidence must come from the final structured validation result or from an explicitly reported remote job failure with stdout/stderr tail, not from a single low-level `trans` timeout.
When an automatic availability probe is added, it should be YAML-controlled and cover these layers without printing secrets:
@@ -133,6 +137,8 @@ When an automatic availability probe is added, it should be YAML-controlled and
3. A tiny `POST /v1/responses` call through the same consumer URL for true OpenAI-compatible request validation.
4. Optional per-upstream account probes if Sub2API exposes a safe account selection or admin-health mechanism; otherwise document that group-level success does not prove every upstream account is healthy.
For D601 public exposure, the equivalent probe set must use the target URL from `config/platform-infra/sub2api.yaml`, include the PK01 Caddy/FRP edge, and require `api.pikapython.com` DNS to resolve to the YAML-declared address before treating HTTPS as validated.
Until continuous probing exists, closeout comments must state that validation was on-demand and include the exact CLI/API entrypoints used.
## k3s Network Policy Requirements
@@ -169,4 +175,4 @@ spec:
This policy must be included in the `sub2api plan` / `apply` manifest rendering so that it is created as part of the normal deployment flow, not maintained as a manual one-off.
`platform-infra sub2api status` must report whether `NetworkPolicy/allow-all` exists and still has `podSelector: {}`, `policyTypes: [Ingress, Egress]`, `ingress: [{}]`, and `egress: [{}]`. For active bundled targets, `platform-infra sub2api validate` must also run temporary in-namespace probe pods that connect to `sub2api-postgres:5432` and `sub2api-redis:6379`; local `pg_isready` inside the PostgreSQL pod alone is insufficient because it does not exercise kube-router cross-pod policy evaluation. For external-DB pending standby targets, `validate --target` checks the predeployment shape instead: no local PostgreSQL, app replicas zero, ClusterIP services, allow-all NetworkPolicy, and local Redis declared as ephemeral cache with readiness required only when Redis replicas are above zero.
`platform-infra sub2api status` must report whether `NetworkPolicy/allow-all` exists and still has `podSelector: {}`, `policyTypes: [Ingress, Egress]`, `ingress: [{}]`, and `egress: [{}]`. For active bundled targets, `platform-infra sub2api validate` must also run temporary in-namespace probe pods that connect to `sub2api-postgres:5432` and `sub2api-redis:6379`; local `pg_isready` inside the PostgreSQL pod alone is insufficient because it does not exercise kube-router cross-pod policy evaluation. For external-DB standby targets, `validate --target` checks the predeployment shape: no local PostgreSQL, app replicas zero, ClusterIP services, allow-all NetworkPolicy, and local Redis declared as ephemeral cache with readiness required only when Redis replicas are above zero. For external-DB active targets, `validate --target` checks that the app uses the external database endpoint, local Redis is ephemeral, no local PostgreSQL StatefulSet exists, and any YAML-declared egress proxy and public exposure resources are present and probed through their configured paths.
@@ -80,6 +80,10 @@ export interface CodexPoolSentinelManifestOptions {
serviceName: string;
serviceDns: string;
appSecretName: string;
proxy?: {
httpProxy: string;
noProxy: string;
} | null;
}
export function defaultCodexPoolSentinelConfig(): CodexPoolSentinelConfig {
@@ -317,6 +321,25 @@ export function renderCodexPoolSentinelManifest(
const activeDeadlineSeconds = Math.max(300, Math.min(3600, config.probe.timeoutSeconds + 240));
const command = sentinelContainerShellCommand(config);
const runtimeImage = codexPoolSentinelRuntimeImage(config).runtimeImage;
const proxyEnv = options.proxy?.httpProxy
? ` - name: HTTP_PROXY
value: ${JSON.stringify(options.proxy.httpProxy)}
- name: HTTPS_PROXY
value: ${JSON.stringify(options.proxy.httpProxy)}
- name: ALL_PROXY
value: ${JSON.stringify(options.proxy.httpProxy)}
- name: http_proxy
value: ${JSON.stringify(options.proxy.httpProxy)}
- name: https_proxy
value: ${JSON.stringify(options.proxy.httpProxy)}
- name: all_proxy
value: ${JSON.stringify(options.proxy.httpProxy)}
- name: NO_PROXY
value: ${JSON.stringify(options.proxy.noProxy)}
- name: no_proxy
value: ${JSON.stringify(options.proxy.noProxy)}
`
: "";
return `apiVersion: v1
kind: Secret
metadata:
@@ -446,6 +469,7 @@ spec:
valueFrom:
fieldRef:
fieldPath: metadata.namespace
${proxyEnv}
volumeMounts:
- name: sentinel-config
mountPath: /opt/sentinel
+222 -80
View File
@@ -17,11 +17,13 @@ import {
import { runSshCommandCapture, type SshCaptureResult } from "./ssh";
const g14K3sRoute = "G14:k3s";
const defaultTargetId = "G14";
const namespace = "platform-infra";
const serviceName = "sub2api";
const serviceDns = `${serviceName}.${namespace}.svc.cluster.local:8080`;
const fieldManager = "unidesk-platform-infra";
const appSecretName = "sub2api-secrets";
const sub2apiConfigPath = rootPath("config", "platform-infra", "sub2api.yaml");
const codexPoolConfigPath = rootPath("config", "platform-infra", "sub2api-codex-pool.yaml");
const sentinelImageDockerfilePath = rootPath("src", "components", "platform-infra", "sub2api", "sentinel.Dockerfile");
const defaultPoolGroupName = "unidesk-codex-pool";
@@ -37,6 +39,7 @@ const remoteJobPollMs = 5_000;
interface DisclosureOptions {
full: boolean;
raw: boolean;
targetId: string;
}
interface SyncOptions extends DisclosureOptions {
@@ -70,6 +73,21 @@ interface SentinelImageOptions extends DisclosureOptions {
dryRun: boolean;
}
interface CodexPoolRuntimeTarget {
id: string;
route: string;
namespace: string;
serviceName: string;
serviceDns: string;
appSecretName: string;
egressProxy: {
enabled: boolean;
applyToSentinel: boolean;
httpProxy: string;
noProxy: string;
} | null;
}
interface CodexProfile {
profile: string;
accountName: string;
@@ -204,14 +222,15 @@ export function codexPoolHelp(): unknown {
output: "json, except trace and sentinel-report default to low-noise text tables",
usage: [
"bun scripts/cli.ts platform-infra sub2api codex-pool plan",
"bun scripts/cli.ts platform-infra sub2api codex-pool sync --confirm [--prune-removed]",
"bun scripts/cli.ts platform-infra sub2api codex-pool validate [--full|--raw]",
"bun scripts/cli.ts platform-infra sub2api codex-pool trace --request-id <requestId> [--since 24h|--tail 20000|--context-seconds 300|--show-lines|--raw]",
"bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image status",
"bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image build --confirm",
"bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-probe --account unidesk-codex-hy --confirm",
"bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-report [--events 20|--full|--raw]",
"bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --confirm",
"bun scripts/cli.ts platform-infra sub2api codex-pool plan --target D601",
"bun scripts/cli.ts platform-infra sub2api codex-pool sync [--target D601] --confirm [--prune-removed]",
"bun scripts/cli.ts platform-infra sub2api codex-pool validate [--target D601] [--full|--raw]",
"bun scripts/cli.ts platform-infra sub2api codex-pool trace [--target D601] --request-id <requestId> [--since 24h|--tail 20000|--context-seconds 300|--show-lines|--raw]",
"bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image status [--target D601]",
"bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image build [--target D601] --confirm",
"bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-probe [--target D601] --account unidesk-codex-hy --confirm",
"bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-report [--target D601] [--events 20|--full|--raw]",
"bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes [--target D601] --confirm",
"bun scripts/cli.ts platform-infra sub2api codex-pool expose --confirm",
"bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm",
],
@@ -254,14 +273,14 @@ export async function runCodexPoolCommand(config: UniDeskConfig, args: string[])
}
function parseSyncOptions(args: string[]): SyncOptions {
validateOptions(args, new Set(["--confirm", "--prune-removed", "--full", "--raw"]));
const disclosure = parseDisclosureOptions(args.filter((arg) => arg !== "--confirm" && arg !== "--prune-removed"));
validateOptions(args, new Set(["--confirm", "--prune-removed", "--full", "--raw", "--target"]));
const disclosure = parseDisclosureOptions(stripBooleanOptions(args, new Set(["--confirm", "--prune-removed"])));
return { ...disclosure, confirm: args.includes("--confirm"), pruneRemoved: args.includes("--prune-removed") };
}
function parseConfirmOptions(args: string[]): ConfirmOptions {
validateOptions(args, new Set(["--confirm", "--full", "--raw"]));
const disclosure = parseDisclosureOptions(args.filter((arg) => arg !== "--confirm"));
validateOptions(args, new Set(["--confirm", "--full", "--raw", "--target"]));
const disclosure = parseDisclosureOptions(stripBooleanOptions(args, new Set(["--confirm"])));
return { ...disclosure, confirm: args.includes("--confirm") };
}
@@ -271,7 +290,8 @@ function parseSentinelImageOptions(args: string[]): SentinelImageOptions {
let confirm = false;
let explicitDryRun = false;
const disclosureArgs: string[] = [];
for (const arg of rest) {
for (let index = 0; index < rest.length; index += 1) {
const arg = rest[index]!;
if (arg === "--confirm") {
confirm = true;
continue;
@@ -280,8 +300,14 @@ function parseSentinelImageOptions(args: string[]): SentinelImageOptions {
explicitDryRun = true;
continue;
}
if (arg === "--full" || arg === "--raw") {
if (arg === "--full" || arg === "--raw" || arg === "--target" || arg.startsWith("--target=")) {
disclosureArgs.push(arg);
if (arg === "--target") {
const value = rest[index + 1];
if (value === undefined || value.startsWith("--")) throw new Error("--target requires a value");
disclosureArgs.push(value);
index += 1;
}
continue;
}
throw new Error(`unsupported option: ${arg}`);
@@ -306,8 +332,14 @@ function parseSentinelProbeOptions(args: string[]): SentinelProbeOptions {
confirm = true;
continue;
}
if (arg === "--full" || arg === "--raw") {
if (arg === "--full" || arg === "--raw" || arg === "--target" || arg.startsWith("--target=")) {
disclosureArgs.push(arg);
if (arg === "--target") {
const value = args[index + 1];
if (value === undefined || value.startsWith("--")) throw new Error("--target requires a value");
disclosureArgs.push(value);
index += 1;
}
continue;
}
if (arg === "--account") {
@@ -336,8 +368,14 @@ function parseSentinelReportOptions(args: string[]): SentinelReportOptions {
const disclosureArgs: string[] = [];
for (let index = 0; index < args.length; index += 1) {
const arg = args[index]!;
if (arg === "--full" || arg === "--raw") {
if (arg === "--full" || arg === "--raw" || arg === "--target" || arg.startsWith("--target=")) {
disclosureArgs.push(arg);
if (arg === "--target") {
const value = args[index + 1];
if (value === undefined || value.startsWith("--")) throw new Error("--target requires a value");
disclosureArgs.push(value);
index += 1;
}
continue;
}
if (arg === "--events") {
@@ -369,8 +407,14 @@ function parseTraceOptions(args: string[]): TraceOptions {
const disclosureArgs: string[] = [];
for (let index = 0; index < args.length; index += 1) {
const arg = args[index]!;
if (arg === "--full" || arg === "--raw") {
if (arg === "--full" || arg === "--raw" || arg === "--target" || arg.startsWith("--target=")) {
disclosureArgs.push(arg);
if (arg === "--target") {
const value = args[index + 1];
if (value === undefined || value.startsWith("--")) throw new Error("--target requires a value");
disclosureArgs.push(value);
index += 1;
}
continue;
}
if (arg === "--show-lines") {
@@ -452,9 +496,26 @@ function readReportEventLimit(raw: string, option: string): number {
}
function parseDisclosureOptions(args: string[]): DisclosureOptions {
validateOptions(args, new Set(["--full", "--raw"]));
validateOptions(args, new Set(["--full", "--raw", "--target"]));
const raw = args.includes("--raw");
return { full: raw || args.includes("--full"), raw };
return { full: raw || args.includes("--full"), raw, targetId: parseTargetId(args) };
}
function parseTargetId(args: string[]): string {
let targetId = defaultTargetId;
for (let index = 0; index < args.length; index += 1) {
const arg = args[index]!;
if (arg === "--target") {
const value = args[index + 1];
if (value === undefined || value.startsWith("--")) throw new Error("--target requires a value");
targetId = value;
index += 1;
continue;
}
if (arg.startsWith("--target=")) targetId = arg.slice("--target=".length);
}
if (!/^[A-Za-z0-9._-]+$/u.test(targetId)) throw new Error("--target must be a simple target id");
return targetId;
}
function splitAccountNames(value: string): string[] {
@@ -462,14 +523,68 @@ function splitAccountNames(value: string): string[] {
}
function validateOptions(args: string[], booleanOptions: Set<string>): void {
for (const arg of args) {
for (let index = 0; index < args.length; index += 1) {
const arg = args[index]!;
if (arg === "--target") {
index += 1;
continue;
}
if (arg.startsWith("--target=") && booleanOptions.has("--target")) continue;
if (booleanOptions.has(arg)) continue;
throw new Error(`unsupported option: ${arg}`);
}
}
function codexPoolPlan(options: DisclosureOptions = { full: false, raw: false }): Record<string, unknown> {
function stripBooleanOptions(args: string[], stripped: Set<string>): string[] {
return args.filter((arg) => !stripped.has(arg));
}
function codexPoolRuntimeTarget(targetId: string = defaultTargetId): CodexPoolRuntimeTarget {
const parsed = Bun.YAML.parse(readFileSync(sub2apiConfigPath, "utf8")) as unknown;
if (!isRecord(parsed) || !Array.isArray(parsed.targets)) throw new Error(`${sub2apiConfigPath}.targets must be a list`);
const raw = parsed.targets.find((item) => isRecord(item) && String(item.id ?? "").toLowerCase() === targetId.toLowerCase());
if (!isRecord(raw)) throw new Error(`${sub2apiConfigPath}.targets does not contain target ${targetId}`);
const id = stringValue(raw.id) ?? targetId;
const route = stringValue(raw.route) ?? (id === defaultTargetId ? g14K3sRoute : "");
const targetNamespace = stringValue(raw.namespace) ?? namespace;
if (route.length === 0) throw new Error(`${sub2apiConfigPath}.targets[${id}].route is required`);
validateKubernetesName(targetNamespace, `${sub2apiConfigPath}.targets[${id}].namespace`, true);
let egressProxy: CodexPoolRuntimeTarget["egressProxy"] = null;
if (isRecord(raw.egressProxy) && raw.egressProxy.enabled === true) {
const proxyServiceName = stringValue(raw.egressProxy.serviceName);
const listenPort = numberValue(raw.egressProxy.listenPort);
if (proxyServiceName === null || listenPort === null) throw new Error(`${sub2apiConfigPath}.targets[${id}].egressProxy.serviceName/listenPort are required`);
validateKubernetesName(proxyServiceName, `${sub2apiConfigPath}.targets[${id}].egressProxy.serviceName`, true);
if (!Number.isInteger(listenPort) || listenPort < 1 || listenPort > 65535) throw new Error(`${sub2apiConfigPath}.targets[${id}].egressProxy.listenPort must be a TCP port`);
const noProxyRaw = Array.isArray(raw.egressProxy.noProxy) ? raw.egressProxy.noProxy : [];
const noProxy = noProxyRaw.map((entry) => stringValue(entry)).filter((entry): entry is string => entry !== null && entry.length > 0).join(",");
egressProxy = {
enabled: true,
applyToSentinel: raw.egressProxy.applyToSentinel === undefined ? true : raw.egressProxy.applyToSentinel === true,
httpProxy: `http://${proxyServiceName}.${targetNamespace}.svc.cluster.local:${listenPort}`,
noProxy,
};
}
return {
id,
route,
namespace: targetNamespace,
serviceName,
serviceDns: `${serviceName}.${targetNamespace}.svc.cluster.local:8080`,
appSecretName,
egressProxy,
};
}
function targetFlag(target: CodexPoolRuntimeTarget): string {
return target.id === defaultTargetId ? "" : ` --target ${target.id}`;
}
function codexPoolPlan(options: DisclosureOptions = { full: false, raw: false, targetId: defaultTargetId }): Record<string, unknown> {
const pool = readCodexPoolConfig();
const runtimeTarget = codexPoolRuntimeTarget(options.targetId);
const profiles = collectCodexProfiles();
const ok = profiles.length > 0 && profiles.every((profile) => profile.ok);
return {
@@ -481,7 +596,7 @@ function codexPoolPlan(options: DisclosureOptions = { full: false, raw: false })
authPattern: "YAML-selected auth files under ~/.codex",
valuesPrinted: false,
},
target: poolTarget(),
target: poolTarget(pool, runtimeTarget),
config: {
path: codexPoolConfigPath,
pool: options.full ? pool : codexPoolConfigSummary(pool),
@@ -490,9 +605,9 @@ function codexPoolPlan(options: DisclosureOptions = { full: false, raw: false })
decision: {
accountType: "openai/apikey",
grouping: `All discovered Codex profiles are bound to one Sub2API group named ${pool.groupName}.`,
unifiedApiKey: `The client-facing API_KEY is controlled by k3s Secret ${namespace}/${pool.apiKeySecretName}.${pool.apiKeySecretKey}.`,
unifiedApiKey: `The client-facing API_KEY is controlled by k3s Secret ${runtimeTarget.namespace}/${pool.apiKeySecretName}.${pool.apiKeySecretKey}.`,
sentinel: pool.sentinel.monitor.enabled
? `Account sentinel is enabled as k8s CronJob ${namespace}/${pool.sentinel.cronJobName}; actions.enabled=${pool.sentinel.actions.enabled}.`
? `Account sentinel is enabled as k8s CronJob ${runtimeTarget.namespace}/${pool.sentinel.cronJobName}; actions.enabled=${pool.sentinel.actions.enabled}.`
: "Account sentinel monitoring is disabled by YAML.",
publicExposure: pool.publicExposure.enabled
? `Default Codex consumers use ${codexConsumerBaseUrl(pool)}; bounded master-local probes may use ${pool.publicExposure.masterBaseUrl}. FRP proxy ${pool.publicExposure.proxyName} maps public ${pool.publicExposure.publicBaseUrl} to ${pool.publicExposure.localIP}:${pool.publicExposure.localPort}.`
@@ -501,13 +616,14 @@ function codexPoolPlan(options: DisclosureOptions = { full: false, raw: false })
configPolicy: "UniDesk-owned durable configuration remains YAML-first; local ~/.codex files and runtime Secrets are not committed.",
},
next: ok
? { sync: "bun scripts/cli.ts platform-infra sub2api codex-pool sync --confirm" }
? { sync: `bun scripts/cli.ts platform-infra sub2api codex-pool sync${targetFlag(runtimeTarget)} --confirm` }
: { fix: "Ensure every discovered config.toml profile has a base_url and either auth.json OPENAI_API_KEY or the configured env_key present in this shell." },
};
}
async function codexPoolSync(config: UniDeskConfig, options: SyncOptions): Promise<Record<string, unknown>> {
const pool = readCodexPoolConfig();
const runtimeTarget = codexPoolRuntimeTarget(options.targetId);
const profiles = collectCodexProfiles();
const planOk = profiles.length > 0 && profiles.every((profile) => profile.ok);
if (!options.confirm || !planOk) {
@@ -522,7 +638,7 @@ async function codexPoolSync(config: UniDeskConfig, options: SyncOptions): Promi
}
const sentinelImage = pool.sentinel.monitor.enabled
? await runCodexPoolSentinelImage(config, pool, { action: "build", confirm: true, dryRun: false, full: options.full, raw: false })
? await runCodexPoolSentinelImage(config, pool, { action: "build", confirm: true, dryRun: false, full: options.full, raw: false, targetId: options.targetId })
: { ok: true, mode: "skipped-monitor-disabled" };
if (sentinelImage.ok !== true) {
return {
@@ -531,7 +647,7 @@ async function codexPoolSync(config: UniDeskConfig, options: SyncOptions): Promi
mode: "blocked-sentinel-image",
sentinelImage,
next: {
image: "bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image build --confirm",
image: `bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image build${targetFlag(runtimeTarget)} --confirm`,
},
};
}
@@ -540,10 +656,14 @@ async function codexPoolSync(config: UniDeskConfig, options: SyncOptions): Promi
pruneRemoved: options.pruneRemoved,
sentinel: {
manifest: renderCodexPoolSentinelManifest(pool.sentinel, sentinelProfileSecrets(profiles), {
namespace,
serviceName,
serviceDns,
appSecretName,
namespace: runtimeTarget.namespace,
serviceName: runtimeTarget.serviceName,
serviceDns: runtimeTarget.serviceDns,
appSecretName: runtimeTarget.appSecretName,
proxy: runtimeTarget.egressProxy?.applyToSentinel ? {
httpProxy: runtimeTarget.egressProxy.httpProxy,
noProxy: runtimeTarget.egressProxy.noProxy,
} : null,
}),
summary: codexPoolSentinelSummary(pool.sentinel),
},
@@ -589,7 +709,7 @@ async function codexPoolSync(config: UniDeskConfig, options: SyncOptions): Promi
tempUnschedulableCredentials: renderSub2ApiTempUnschedulableCredentials(profile.tempUnschedulable),
})),
};
const result = await runRemoteCodexPoolScript(config, "sync", syncScript(payload, pool));
const result = await runRemoteCodexPoolScript(config, "sync", syncScript(payload, pool, runtimeTarget), runtimeTarget);
const parsed = parseJsonOutput(result.stdout);
if (options.raw) {
return {
@@ -614,7 +734,7 @@ async function codexPoolSync(config: UniDeskConfig, options: SyncOptions): Promi
? compactCapture(result, { full: options.full || result.exitCode !== 0 })
: options.full ? parsed : codexPoolSyncSummary(parsed),
next: {
validate: "bun scripts/cli.ts platform-infra sub2api codex-pool validate",
validate: `bun scripts/cli.ts platform-infra sub2api codex-pool validate${targetFlag(runtimeTarget)}`,
},
};
}
@@ -625,6 +745,7 @@ async function codexPoolSentinelImage(config: UniDeskConfig, options: SentinelIm
}
async function runCodexPoolSentinelImage(config: UniDeskConfig, pool: CodexPoolConfig, options: SentinelImageOptions): Promise<Record<string, unknown>> {
const runtimeTarget = codexPoolRuntimeTarget(options.targetId);
const target = codexPoolSentinelRuntimeImage(pool.sentinel);
if (options.action === "build" && options.dryRun) {
return {
@@ -635,13 +756,13 @@ async function runCodexPoolSentinelImage(config: UniDeskConfig, pool: CodexPoolC
dockerfile: sentinelImageDockerfilePath,
mutation: false,
next: {
confirm: "bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image build --confirm",
confirm: `bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-image build${targetFlag(runtimeTarget)} --confirm`,
},
};
}
const mode: RemoteCodexPoolMode = options.action === "status" ? "sentinel-image-status" : "sentinel-image-build";
const script = options.action === "status" ? sentinelImageStatusScript(pool) : sentinelImageBuildScript(pool);
const result = await runRemoteCodexPoolScript(config, mode, script);
const script = options.action === "status" ? sentinelImageStatusScript(pool, runtimeTarget) : sentinelImageBuildScript(pool, runtimeTarget);
const result = await runRemoteCodexPoolScript(config, mode, script, runtimeTarget);
const parsed = parseJsonOutput(result.stdout);
if (options.raw) {
return {
@@ -665,7 +786,8 @@ async function runCodexPoolSentinelImage(config: UniDeskConfig, pool: CodexPoolC
async function codexPoolValidate(config: UniDeskConfig, options: DisclosureOptions): Promise<Record<string, unknown>> {
const pool = readCodexPoolConfig();
const result = await runRemoteCodexPoolScript(config, "validate", validateScript(pool));
const runtimeTarget = codexPoolRuntimeTarget(options.targetId);
const result = await runRemoteCodexPoolScript(config, "validate", validateScript(pool, runtimeTarget), runtimeTarget);
const parsed = parseJsonOutput(result.stdout);
if (options.raw) {
return {
@@ -685,7 +807,8 @@ async function codexPoolValidate(config: UniDeskConfig, options: DisclosureOptio
async function codexPoolTrace(config: UniDeskConfig, options: TraceOptions): Promise<Record<string, unknown> | RenderedCliResult> {
const pool = readCodexPoolConfig();
const result = await capture(config, g14K3sRoute, ["script"], traceScript(pool, options));
const runtimeTarget = codexPoolRuntimeTarget(options.targetId);
const result = await capture(config, runtimeTarget.route, ["script"], traceScript(pool, options, runtimeTarget));
const parsed = parseJsonOutput(result.stdout);
const ok = result.exitCode === 0 && boolField(parsed, "ok", false);
if (options.raw) {
@@ -707,7 +830,8 @@ async function codexPoolTrace(config: UniDeskConfig, options: TraceOptions): Pro
async function codexPoolSentinelReport(config: UniDeskConfig, options: SentinelReportOptions): Promise<Record<string, unknown> | RenderedCliResult> {
const pool = readCodexPoolConfig();
const result = await capture(config, g14K3sRoute, ["script"], sentinelReportScript(pool, options.events));
const runtimeTarget = codexPoolRuntimeTarget(options.targetId);
const result = await capture(config, runtimeTarget.route, ["script"], sentinelReportScript(pool, options.events, runtimeTarget));
const parsed = parseJsonOutput(result.stdout);
const ok = result.exitCode === 0 && boolField(parsed, "ok", false);
if (options.raw) {
@@ -729,6 +853,7 @@ async function codexPoolSentinelReport(config: UniDeskConfig, options: SentinelR
async function codexPoolSentinelProbe(config: UniDeskConfig, options: SentinelProbeOptions): Promise<Record<string, unknown>> {
const pool = readCodexPoolConfig();
const runtimeTarget = codexPoolRuntimeTarget(options.targetId);
const configuredAccounts = desiredAccountNames(pool);
const missing = options.accounts.filter((account) => !configuredAccounts.includes(account));
if (missing.length > 0) {
@@ -746,11 +871,11 @@ async function codexPoolSentinelProbe(config: UniDeskConfig, options: SentinelPr
ok: true,
action: "platform-infra-sub2api-codex-pool-sentinel-probe",
mode: "dry-run",
target: poolTarget(pool),
target: poolTarget(pool, runtimeTarget),
accounts: options.accounts,
effect: "Would create one Kubernetes Job from the managed sentinel CronJob and force an immediate marker probe for the requested account(s).",
next: {
confirm: `bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-probe --account ${options.accounts.join(",")} --confirm`,
confirm: `bun scripts/cli.ts platform-infra sub2api codex-pool sentinel-probe${targetFlag(runtimeTarget)} --account ${options.accounts.join(",")} --confirm`,
},
valuesPrinted: false,
};
@@ -758,7 +883,7 @@ async function codexPoolSentinelProbe(config: UniDeskConfig, options: SentinelPr
const payload = {
accounts: options.accounts,
};
const result = await runRemoteCodexPoolScript(config, "sentinel-probe", sentinelProbeScript(payload, pool));
const result = await runRemoteCodexPoolScript(config, "sentinel-probe", sentinelProbeScript(payload, pool, runtimeTarget), runtimeTarget);
const parsed = parseJsonOutput(result.stdout);
if (options.raw) {
return {
@@ -777,19 +902,20 @@ async function codexPoolSentinelProbe(config: UniDeskConfig, options: SentinelPr
}
async function codexPoolCleanupProbes(config: UniDeskConfig, options: ConfirmOptions): Promise<Record<string, unknown>> {
const runtimeTarget = codexPoolRuntimeTarget(options.targetId);
if (!options.confirm) {
return {
ok: true,
action: "platform-infra-sub2api-codex-pool-cleanup-probes",
mode: "dry-run",
target: poolTarget(),
target: poolTarget(readCodexPoolConfig(), runtimeTarget),
scope: "Only deletes temporary resources whose names start with unidesk-probe-.",
next: { confirm: "bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes --confirm" },
next: { confirm: `bun scripts/cli.ts platform-infra sub2api codex-pool cleanup-probes${targetFlag(runtimeTarget)} --confirm` },
valuesPrinted: false,
};
}
const pool = readCodexPoolConfig();
const result = await capture(config, g14K3sRoute, ["script"], cleanupProbesScript(pool));
const result = await capture(config, runtimeTarget.route, ["script"], cleanupProbesScript(pool, runtimeTarget));
const parsed = parseJsonOutput(result.stdout);
if (options.raw) {
return {
@@ -2485,16 +2611,17 @@ function codexPoolSyncSummary(parsed: Record<string, unknown> | null): Record<st
};
}
function poolTarget(pool = readCodexPoolConfig()): Record<string, unknown> {
function poolTarget(pool = readCodexPoolConfig(), target = codexPoolRuntimeTarget(defaultTargetId)): Record<string, unknown> {
return {
route: g14K3sRoute,
namespace,
id: target.id,
route: target.route,
namespace: target.namespace,
service: serviceName,
serviceDns,
serviceDns: target.serviceDns,
configPath: codexPoolConfigPath,
groupName: pool.groupName,
apiKeyName: pool.apiKeyName,
apiKeySecret: `${namespace}/${pool.apiKeySecretName}.${pool.apiKeySecretKey}`,
apiKeySecret: `${target.namespace}/${pool.apiKeySecretName}.${pool.apiKeySecretKey}`,
minOwnerConcurrency: pool.minOwnerConcurrency,
minOwnerConcurrencySource: pool.minOwnerConcurrencySource,
accountCapacityTotal: desiredAccountCapacityTotal(pool),
@@ -2507,6 +2634,12 @@ function poolTarget(pool = readCodexPoolConfig()): Record<string, unknown> {
cronJobName: pool.sentinel.cronJobName,
stateConfigMapName: pool.sentinel.stateConfigMapName,
},
egressProxy: target.egressProxy === null ? null : {
enabled: target.egressProxy.enabled,
applyToSentinel: target.egressProxy.applyToSentinel,
httpProxy: target.egressProxy.httpProxy,
noProxy: target.egressProxy.noProxy,
},
valuesPrinted: false,
};
}
@@ -2902,10 +3035,10 @@ spec:
`;
}
async function fetchPoolApiKey(config: UniDeskConfig, pool: CodexPoolConfig): Promise<{ apiKey: string | null; error: string | null }> {
const result = await capture(config, g14K3sRoute, ["script"], `
async function fetchPoolApiKey(config: UniDeskConfig, pool: CodexPoolConfig, target = codexPoolRuntimeTarget(defaultTargetId)): Promise<{ apiKey: string | null; error: string | null }> {
const result = await capture(config, target.route, ["script"], `
set -u
kubectl -n ${namespace} get secret ${pool.apiKeySecretName} -o json
kubectl -n ${target.namespace} get secret ${pool.apiKeySecretName} -o json
`);
if (result.exitCode !== 0) {
return { apiKey: null, error: `read pool API key secret failed: ${result.stderr.slice(-1000)}` };
@@ -2913,7 +3046,7 @@ kubectl -n ${namespace} get secret ${pool.apiKeySecretName} -o json
const parsed = parseJsonOutput(result.stdout);
const data = isRecord(parsed?.data) ? parsed.data : null;
const encoded = typeof data?.[pool.apiKeySecretKey] === "string" ? data[pool.apiKeySecretKey] : null;
if (encoded === null) return { apiKey: null, error: `${namespace}/${pool.apiKeySecretName}.${pool.apiKeySecretKey} missing` };
if (encoded === null) return { apiKey: null, error: `${target.namespace}/${pool.apiKeySecretName}.${pool.apiKeySecretKey} missing` };
try {
const apiKey = Buffer.from(encoded, "base64").toString("utf8");
return apiKey.length > 0 ? { apiKey, error: null } : { apiKey: null, error: "decoded API key is empty" };
@@ -3208,16 +3341,16 @@ export function codexPoolSentinelProbeConfigFingerprint(input: {
}));
}
function syncScript(payload: unknown, pool: CodexPoolConfig): string {
function syncScript(payload: unknown, pool: CodexPoolConfig, target: CodexPoolRuntimeTarget): string {
const encoded = Buffer.from(JSON.stringify(payload), "utf8").toString("base64");
return remotePythonScript("sync", encoded, pool);
return remotePythonScript("sync", encoded, pool, target);
}
function validateScript(pool: CodexPoolConfig): string {
return remotePythonScript("validate", "", pool);
function validateScript(pool: CodexPoolConfig, target: CodexPoolRuntimeTarget): string {
return remotePythonScript("validate", "", pool, target);
}
function traceScript(pool: CodexPoolConfig, options: TraceOptions): string {
function traceScript(pool: CodexPoolConfig, options: TraceOptions, target: CodexPoolRuntimeTarget): string {
const encoded = Buffer.from(JSON.stringify({
requestId: options.requestId,
since: options.since,
@@ -3225,15 +3358,15 @@ function traceScript(pool: CodexPoolConfig, options: TraceOptions): string {
contextSeconds: options.contextSeconds,
showLines: options.showLines,
}), "utf8").toString("base64");
return remotePythonScript("trace", encoded, pool);
return remotePythonScript("trace", encoded, pool, target);
}
function sentinelProbeScript(payload: unknown, pool: CodexPoolConfig): string {
function sentinelProbeScript(payload: unknown, pool: CodexPoolConfig, target: CodexPoolRuntimeTarget): string {
const encoded = Buffer.from(JSON.stringify(payload), "utf8").toString("base64");
return remotePythonScript("sentinel-probe", encoded, pool);
return remotePythonScript("sentinel-probe", encoded, pool, target);
}
function sentinelReportScript(pool: CodexPoolConfig, events: number): string {
function sentinelReportScript(pool: CodexPoolConfig, events: number, target: CodexPoolRuntimeTarget): string {
const stateName = pool.sentinel.stateConfigMapName;
const cronJobName = pool.sentinel.cronJobName;
return `
@@ -3243,7 +3376,7 @@ import json
import subprocess
from datetime import datetime, timezone, timedelta
NAMESPACE = ${JSON.stringify(namespace)}
NAMESPACE = ${JSON.stringify(target.namespace)}
STATE_NAME = ${JSON.stringify(stateName)}
CRONJOB_NAME = ${JSON.stringify(cronJobName)}
EVENT_LIMIT = ${JSON.stringify(events)}
@@ -3445,22 +3578,22 @@ PY
`;
}
function cleanupProbesScript(pool: CodexPoolConfig): string {
return remotePythonScript("cleanup-probes", "", pool);
function cleanupProbesScript(pool: CodexPoolConfig, target: CodexPoolRuntimeTarget): string {
return remotePythonScript("cleanup-probes", "", pool, target);
}
function sentinelImageStatusScript(pool: CodexPoolConfig): string {
function sentinelImageStatusScript(pool: CodexPoolConfig, targetRuntime: CodexPoolRuntimeTarget): string {
const target = codexPoolSentinelRuntimeImage(pool.sentinel);
return remoteSentinelImageScript("status", target, pool.sentinel, null);
return remoteSentinelImageScript("status", target, pool.sentinel, null, targetRuntime);
}
function sentinelImageBuildScript(pool: CodexPoolConfig): string {
function sentinelImageBuildScript(pool: CodexPoolConfig, targetRuntime: CodexPoolRuntimeTarget): string {
const target = codexPoolSentinelRuntimeImage(pool.sentinel);
const dockerfile = readFileSync(sentinelImageDockerfilePath, "utf8");
return remoteSentinelImageScript("build", target, pool.sentinel, dockerfile);
return remoteSentinelImageScript("build", target, pool.sentinel, dockerfile, targetRuntime);
}
function remoteSentinelImageScript(mode: "status" | "build", target: ReturnType<typeof codexPoolSentinelRuntimeImage>, sentinel: CodexPoolSentinelConfig, dockerfile: string | null): string {
function remoteSentinelImageScript(mode: "status" | "build", target: ReturnType<typeof codexPoolSentinelRuntimeImage>, sentinel: CodexPoolSentinelConfig, dockerfile: string | null, targetRuntime: CodexPoolRuntimeTarget): string {
const dockerfileB64 = dockerfile === null ? "" : Buffer.from(dockerfile, "utf8").toString("base64");
return `
set -eu
@@ -3470,6 +3603,7 @@ repo=${shQuote("platform-infra/sub2api-account-sentinel")}
tag=${shQuote(target.tag)}
base_image=${shQuote(target.baseImage)}
openai_version=${shQuote(sentinel.sdk.openaiPythonVersion)}
runtime_target=${shQuote(targetRuntime.id)}
work=/tmp/unidesk-sub2api-sentinel-image
mkdir -p "$work"
dockerfile_path="$work/sentinel.Dockerfile"
@@ -3527,7 +3661,13 @@ ${dockerfileB64}
UNIDESK_SENTINEL_DOCKERFILE_B64
export NO_PROXY=localhost,127.0.0.1,::1,host.docker.internal,74.48.78.17,192.168.0.0/16,10.0.0.0/8,172.16.0.0/12,10.42.0.0/16,10.43.0.0/16,.svc,.svc.cluster.local,.cluster.local,kubernetes,kubernetes.default,kubernetes.default.svc,127.0.0.1:5000,localhost:5000
export no_proxy=$NO_PROXY
docker build --pull \\
set -- --pull
base_image_source="registry"
if [ "$runtime_target" != "G14" ] && docker image inspect "$base_image" >/dev/null 2>&1; then
set --
base_image_source="local-cache"
fi
docker build "$@" \\
--build-arg BASE_IMAGE="$base_image" \\
--build-arg OPENAI_PYTHON_VERSION="$openai_version" \\
--build-arg HTTP_PROXY= --build-arg HTTPS_PROXY= --build-arg http_proxy= --build-arg https_proxy= \\
@@ -3546,6 +3686,7 @@ print(json.dumps({
"image": "${target.runtimeImage}",
"baseImage": "${target.baseImage}",
"tag": "${target.tag}",
"baseImageSource": "${"${base_image_source}"}",
"digest": "${"${digest}"}" or None,
}, ensure_ascii=False, indent=2))
PY
@@ -3620,7 +3761,7 @@ function desiredAccountTempUnschedulableMap(pool: CodexPoolConfig): Record<strin
return policies;
}
function remotePythonScript(mode: "sync" | "validate" | "trace" | "cleanup-probes" | "sentinel-probe", encodedPayload: string, pool: CodexPoolConfig): string {
function remotePythonScript(mode: "sync" | "validate" | "trace" | "cleanup-probes" | "sentinel-probe", encodedPayload: string, pool: CodexPoolConfig, target: CodexPoolRuntimeTarget): string {
return `
set -u
python3 - <<'PY'
@@ -3635,11 +3776,12 @@ import time
from datetime import datetime, timezone, timedelta
from urllib.parse import quote
NAMESPACE = "${namespace}"
SERVICE_NAME = "${serviceName}"
SERVICE_DNS = "${serviceDns}"
TARGET_ID = ${JSON.stringify(target.id)}
NAMESPACE = ${JSON.stringify(target.namespace)}
SERVICE_NAME = ${JSON.stringify(target.serviceName)}
SERVICE_DNS = ${JSON.stringify(target.serviceDns)}
FIELD_MANAGER = "${fieldManager}"
APP_SECRET_NAME = "${appSecretName}"
APP_SECRET_NAME = ${JSON.stringify(target.appSecretName)}
POOL_GROUP_NAME = "${pool.groupName}"
POOL_API_KEY_NAME = "${pool.apiKeyName}"
POOL_API_KEY_SECRET_NAME = "${pool.apiKeySecretName}"
@@ -6099,17 +6241,17 @@ async function capture(config: UniDeskConfig, target: string, args: string[], in
type RemoteCodexPoolMode = "sync" | "validate" | "sentinel-probe" | "sentinel-image-status" | "sentinel-image-build";
async function runRemoteCodexPoolScript(config: UniDeskConfig, mode: RemoteCodexPoolMode, script: string): Promise<SshCaptureResult> {
async function runRemoteCodexPoolScript(config: UniDeskConfig, mode: RemoteCodexPoolMode, script: string, target = codexPoolRuntimeTarget(defaultTargetId)): Promise<SshCaptureResult> {
const jobName = `codex-pool-${mode}-${Date.now().toString(36)}`.slice(0, 63);
const startedAtMs = Date.now();
const start = await capture(config, g14K3sRoute, ["script"], remoteJobStartScript(jobName, script));
const start = await capture(config, target.route, ["script"], remoteJobStartScript(jobName, script));
const started = parseJsonOutput(start.stdout);
if (start.exitCode !== 0 || boolField(started, "ok", false) !== true) return start;
let latest: RemoteCodexPoolJobStatus | null = null;
while (Date.now() - startedAtMs <= remoteJobTimeoutMs) {
await sleep(remoteJobPollMs);
const probe = await capture(config, g14K3sRoute, ["script"], remoteJobStatusScript(jobName));
const probe = await capture(config, target.route, ["script"], remoteJobStatusScript(jobName));
const parsed = parseJsonOutput(probe.stdout);
latest = normalizeRemoteJobStatus(parsed);
process.stderr.write(`${JSON.stringify({
File diff suppressed because it is too large Load Diff