feat(v3s): add D518 code queue standby pod

This commit is contained in:
Codex
2026-05-15 14:56:02 +00:00
parent 00add260e3
commit 9d6be83c52
10 changed files with 500 additions and 43 deletions
+1
View File
@@ -0,0 +1 @@
@AGENTS.md
+2 -2
View File
@@ -50,11 +50,11 @@ frontend 的 Docker 上线顺序为:先运行必要的本地校验,例如 `b
## Health Criteria
服务跑通的最低标准是:backend-core 内网 `/health` 返回 okfrontend 公网 `/health` 返回 okprovider ingress 公网 `/health` 返回 okdatabase 在容器内 `pg_isready` 可用,Todo Note 后端 `/api/health` 返回 `storage=postgres``v3sctl-adapter` `/api/control-plane` 可见 Kubernetes API service proxy 状态、D601 active serving healthy、D518 expected/missing 节点状态和 no-fallback 路径,Code Queue `/health` 经 v3s active Service 返回轻量 readiness、默认模型`queue.storage``/api/tasks/overview` 返回 PostgreSQL 队列总览,Project Manager `/health` 返回 `storage.primary=postgres` 和项目数量,backend-core `/api/performance` 返回性能指标,`/api/nodes` 中出现 `main-server``D601` provider 且状态为 `online``/api/nodes/system-status` 中出现对应 CPU/内存/硬盘采样,`/api/nodes/docker-status` 中能看到主 server 与 D601 Docker 快照。D518 未接入前不得用 D601->D518 直连NodePort 绕过,也不得把 D518 missing 当作 D601 active Service 不可用;接入验收应证明 D518 通过 k3s/k8s 原生 agent/proxy/tunnel 或显式 provider 维护代理进入控制面。交付前还必须运行 `bun scripts/cli.ts e2e run`,并以 `docs/reference/e2e.md` 的门禁作为最终判定。
服务跑通的最低标准是:backend-core 内网 `/health` 返回 okfrontend 公网 `/health` 返回 okprovider ingress 公网 `/health` 返回 okdatabase 在容器内 `pg_isready` 可用,Todo Note 后端 `/api/health` 返回 `storage=postgres``v3sctl-adapter` `/api/control-plane` 可见 `unidesk-v8s` Kubernetes API service proxy 状态、D601 active serving healthy、D518 standby pod ready、`presentNodeIds=[D601,D518]``missingNodeIds=[]` 和 no-fallback 路径,Code Queue `/health` 经 v3s active Service 返回轻量 readiness、默认模型`queue.storage``egressProxy.connected=true``/api/tasks/overview` 返回 PostgreSQL 队列总览,Project Manager `/health` 返回 `storage.primary=postgres` 和项目数量,backend-core `/api/performance` 返回性能指标,`/api/nodes` 中出现 `main-server``D601``D518` provider 且状态为 `online``/api/nodes/system-status` 中出现对应 CPU/内存/硬盘采样,`/api/nodes/docker-status` 中能看到主 server、D601 与 D518 Docker 快照。D518 必须通过 K3S agent 加入 V8S 控制面并运行 `code-queue-d518` standby Pod不得用 D601->D518 直连NodePort 或 provider-gateway business HTTP 绕过 Kubernetes service route。交付前还必须运行 `bun scripts/cli.ts e2e run`,并以 `docs/reference/e2e.md` 的门禁作为最终判定。
## Code Queue D601 Resource Budget
Code Queue 已从主 server 迁移到 D601 v3s/k8s,但仍必须保持明确的 memory/swap 硬上限,默认 `CODE_QUEUE_MAX_ACTIVE_QUEUES=0` 以恢复 queue 间并行,仍保持 `CODE_QUEUE_IN_MEMORY_OUTPUT_RECORDS=10``CODE_QUEUE_IN_MEMORY_EVENT_RECORDS=10` 这类小热窗口;任务历史、队列统计和 Trace/output 读取必须优先从 PostgreSQL 直读或聚合,`/health` 只做轻量 readiness,不能为了性能便利在 Bun 进程内缓存全量历史。任何提高 Code Queue 热窗口、日志缓冲、Playwright/Codex 子进程常驻规模或容器上限的变更,或把 `CODE_QUEUE_MAX_ACTIVE_QUEUES` 显式改成正数,都必须在同一任务里说明 D601 资源预算来源,并通过 D601 `kubectl -n unidesk get deploy,svc,pod``kubectl -n unidesk top pod` 或等价 Docker stats、`microservice health code-queue` 和对应 E2E 证明未重新引入内存爆炸风险。
Code Queue 已从主 server 迁移到 D601 v3s/k8s,但仍必须保持明确的 memory/swap 硬上限,默认 `CODE_QUEUE_MAX_ACTIVE_QUEUES=0` 以恢复 queue 间并行,仍保持 `CODE_QUEUE_IN_MEMORY_OUTPUT_RECORDS=10``CODE_QUEUE_IN_MEMORY_EVENT_RECORDS=10` 这类小热窗口;任务历史、队列统计和 Trace/output 读取必须优先从 PostgreSQL 直读或聚合,`/health` 只做轻量 readiness,不能为了性能便利在 Bun 进程内缓存全量历史。任何提高 Code Queue 热窗口、日志缓冲、Playwright/Codex 子进程常驻规模或容器上限的变更,或把 `CODE_QUEUE_MAX_ACTIVE_QUEUES` 显式改成正数,都必须在同一任务里说明 D601 资源预算来源,并通过 D601 `KUBECONFIG=/home/ubuntu/unidesk-code-queue-deploy/.state/v8s/kubeconfig kubectl -n unidesk get deploy,svc,pod``kubectl -n unidesk top pod` 或等价 Docker stats、`microservice health code-queue` 和对应 E2E 证明未重新引入内存爆炸风险。
## Database Connection Budget
+1 -1
View File
@@ -35,7 +35,7 @@ Typical targeted commands:
- Core API: `docker exec unidesk-backend-core` calls internal `GET /api/overview`, which must report `dbReady: true`, `pgdata.volumeName=unidesk_pgdata_10gb`, a positive PostgreSQL database byte count, and at least one online node; internal `GET /api/performance` must report component request statistics, internal operation statistics, PGDATA usage and Code Queue PostgreSQL storage metadata.
- Provider self-connection: internal `GET /api/nodes` must contain `main-server` with `status: online`, `labels.providerGatewayVersion` equal to `src/components/provider-gateway/package.json`, `labels.providerGatewayUpgradePolicy: "always-enabled"`, `labels.providerGatewayRestartPolicyOk: true`, `labels.providerGatewayPidModeOk: true`, and `labels.providerGatewayRuntimeGuardOk: true`; internal `GET /api/nodes/system-status` must contain CPU/memory/disk samples plus a non-empty process resource list sorted by memory by default; internal `GET /api/nodes/docker-status` must contain a Docker snapshot for `main-server`; every running `provider-gateway` container visible in Docker snapshots must report `restartPolicy: "always"` and `pidMode: "host"`; public provider ingress `/health` must return ok.
- Provider remote control: internal `/api/dispatch` must successfully complete a real `provider.upgrade` task in `mode: "plan"` so the upgrade path is validated without recreating the running gateway during E2E.
- User services: internal `/api/microservices` must include `todo-note` and `oa-event-flow` on `main-server`, canonical `filebrowser` on `D518`, plus `v3sctl-adapter`, `code-queue`, `findjob`, `pipeline`, `met-nonlinear`, `claudeqq` and `filebrowser-d601` on `D601` with `public=false`; `/api/microservices/todo-note/health` must report `storage=postgres`, `/api/microservices/todo-note/proxy/api/instances` must expose the migrated Todo Note lists, and a temporary Todo Note list create/add/toggle/undo/delete cycle must succeed through the real provider-gateway proxy; `/api/microservices/oa-event-flow/health`, `/api/microservices/oa-event-flow/proxy/api/diagnostics`, `/api/microservices/oa-event-flow/proxy/api/events`, `/api/microservices/oa-event-flow/proxy/api/events?tags=service:pipeline` and `/api/microservices/oa-event-flow/proxy/api/stats/trace` must prove the independent OA event table、Pipeline bridge 和 stats center are reachable through UniDesk proxy; `/api/microservices/v3sctl-adapter/health` and `/api/microservices/v3sctl-adapter/proxy/api/control-plane` must expose the D601 v3s/k8s control plane, `kubeApiProxy.mode=kubernetes-api-service-proxy`, D601 active instance `servingHealthy=true`, D518 expected/missing state when D518 has not joined, `status=degraded` for incomplete topology, and `noFallback=true`; `/api/microservices/code-queue/health` must return the active Code Queue backend summary with default model `gpt-5.5`, and `/api/microservices/code-queue/proxy/api/tasks/overview` must return queue state through backend-core -> v3sctl-adapter -> Kubernetes API service proxy -> v3s/k8s Service, not through a `serviceId=code-queue` provider-gateway direct task or `/api/code-queue-direct`; `/api/microservices/filebrowser/health`, `/api/microservices/filebrowser-d601/health` and `/api/microservices/filebrowser/proxy/` must prove File Browser health and WebUI access through UniDesk proxy; `/api/microservices/findjob/health` and `/api/microservices/findjob/proxy/api/summary` must succeed through the real provider-gateway proxy; `/api/microservices/findjob/proxy/api/jobs?__unideskArrayLimit=jobs:5` must return a bounded preview with `_unidesk.arrayLimits` metadata; `/api/microservices/pipeline/health`, `/api/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3` and `/api/microservices/pipeline/proxy/api/oa-event-flow/diagnostics` must return Pipeline health, registry/run previews and OA event-flow evidence; `/api/microservices/met-nonlinear/health`, `/api/microservices/met-nonlinear/proxy/api/queue`, `/api/microservices/met-nonlinear/proxy/api/projects?root=projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects?root=ex_projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects/config?path=<projectPath>` and `/api/microservices/met-nonlinear/proxy/api/images` must return the D601 TS backend health, queue/GPU policy, full project tree inputs, structured project detail and ready `met-nonlinear-ml:tf26` image status.
- User services: internal `/api/microservices` must include `todo-note` and `oa-event-flow` on `main-server`, canonical `filebrowser` on `D518`, plus `v3sctl-adapter`, `code-queue`, `findjob`, `pipeline`, `met-nonlinear`, `claudeqq` and `filebrowser-d601` on `D601` with `public=false`; `/api/microservices/todo-note/health` must report `storage=postgres`, `/api/microservices/todo-note/proxy/api/instances` must expose the migrated Todo Note lists, and a temporary Todo Note list create/add/toggle/undo/delete cycle must succeed through the real provider-gateway proxy; `/api/microservices/oa-event-flow/health`, `/api/microservices/oa-event-flow/proxy/api/diagnostics`, `/api/microservices/oa-event-flow/proxy/api/events`, `/api/microservices/oa-event-flow/proxy/api/events?tags=service:pipeline` and `/api/microservices/oa-event-flow/proxy/api/stats/trace` must prove the independent OA event table、Pipeline bridge 和 stats center are reachable through UniDesk proxy; `/api/microservices/v3sctl-adapter/health` and `/api/microservices/v3sctl-adapter/proxy/api/control-plane` must expose the D601 `unidesk-v8s` control plane, `kubeApiProxy.mode=kubernetes-api-service-proxy`, D601 active instance `servingHealthy=true`, D518 standby instance `healthy=true`, `presentNodeIds=[D601,D518]`, `missingNodeIds=[]`, `status=healthy`, and `noFallback=true`; `/api/microservices/code-queue/health` must return the active Code Queue backend summary with default model `gpt-5.5`, `egressProxy.connected=true`, and `/api/microservices/code-queue/proxy/api/tasks/overview` must return queue state through backend-core -> v3sctl-adapter -> Kubernetes API service proxy -> v3s/k8s Service, not through a `serviceId=code-queue` provider-gateway direct task or `/api/code-queue-direct`; `/api/microservices/filebrowser/health`, `/api/microservices/filebrowser-d601/health` and `/api/microservices/filebrowser/proxy/` must prove File Browser health and WebUI access through UniDesk proxy; `/api/microservices/findjob/health` and `/api/microservices/findjob/proxy/api/summary` must succeed through the real provider-gateway proxy; `/api/microservices/findjob/proxy/api/jobs?__unideskArrayLimit=jobs:5` must return a bounded preview with `_unidesk.arrayLimits` metadata; `/api/microservices/pipeline/health`, `/api/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3` and `/api/microservices/pipeline/proxy/api/oa-event-flow/diagnostics` must return Pipeline health, registry/run previews and OA event-flow evidence; `/api/microservices/met-nonlinear/health`, `/api/microservices/met-nonlinear/proxy/api/queue`, `/api/microservices/met-nonlinear/proxy/api/projects?root=projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects?root=ex_projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects/config?path=<projectPath>` and `/api/microservices/met-nonlinear/proxy/api/images` must return the D601 TS backend health, queue/GPU policy, full project tree inputs, structured project detail and ready `met-nonlinear-ml:tf26` image status.
- ClaudeQQ availability: `/api/microservices/claudeqq/health` must only pass when `ready=true`, NapCat HTTP and WebSocket are connected, and `napcat.loginState=logged_in`; `/api/microservices/claudeqq/proxy/api/napcat/login` must show the same logged-in account state and `/api/microservices/claudeqq/proxy/api/events/recent` must prove the backend can read the persistent event cache. A QR-code-only or not-logged-in NapCat state must be treated as unhealthy.
- Database: the command writes an `unidesk_e2e_markers` row through `docker exec unidesk-database psql`, confirms provider state is stored in PostgreSQL, and checks Todo Note rows exist in `todo_note_instances` using the same named volume.
- Pipeline OA event flow: `microservice:pipeline-oa-event-flow` must prove both no-audit and monitor-audit runs are driven by OA events end to end. The event stream must show `node-finished` as a neutral fact with `pipeline:{pipelineId}` and `epoch:{runId}` tags, OA policy as the source of downstream/audit decisions, monitor decisions as OA control events, and runner control-result evidence. E2E must fail if delivery still depends on a legacy detail audit policy flag as policy authority, independent legacy audit-request points, a legacy batch completion gate, direct monitor-to-runner calls, or frontend/CLI writes to Pipeline `.state`.
+7 -6
View File
@@ -130,11 +130,12 @@ Baidu Netdisk 在 UniDesk 语境中按纯后端服务管理:不得暴露百度
- Provider`D601`,由 D601 provider-gateway 仅维护和访问 `v3sctl-adapter` 的本机私有端口 `127.0.0.1:4266`provider-gateway 不再作为 `code-queue` 业务请求的直接代理。
- 代码引用:`https://github.com/pikasTech/unidesk` 与配置中的 `repository.commitId`;服务源码位于 `src/components/microservices/v3sctl-adapter`,属于 UniDesk 自有控制面组件。
- 部署引用:UniDesk 仓库中的 `src/components/microservices/v3sctl-adapter/docker-compose.d601.yml`Dockerfile 为 `src/components/microservices/v3sctl-adapter/Dockerfile`,容器名为 `v3sctl-adapter`
- v3s 实现:当前 `v3sctl-managed` 可以落到 k3s、k8s 或等价标准 Kubernetes 控制面,但必须使用 Kubernetes 原生命名空间、Deployment、Service、readiness/liveness probe、Kubernetes API service proxy 等规范对象;不得把裸容器端口、NodePort、SSH curl、provider-gateway `microservice.http` 或 host 直连地址伪装成 v3s 服务路由。
- v3s 实现:当前运行控制面为 D601 上的 `unidesk-v8s` K3S server 和 D518 上的 K3S agent`v3sctl-managed` 可以落到 k3s、k8s 或等价标准 Kubernetes 控制面,但必须使用 Kubernetes 原生命名空间、Deployment、Service、readiness/liveness probe、Kubernetes API service proxy 等规范对象;不得把裸容器端口、NodePort、SSH curl、provider-gateway `microservice.http` 或 host 直连地址伪装成 v3s 服务路由。
- V8S 系统组件:`unidesk-v8s` server 必须禁用非必要的 `traefik``servicelb``metrics-server`,只保留业务必需的 API server、CoreDNS 与 local-path provisionerCoreDNS 和 local-path provisioner 固定运行在 D601 控制面节点,避免 D518 维护隧道限制导致系统 DNS/readiness 抖动。
- manifest:代管服务声明放在 `src/components/microservices/v3sctl-adapter/v3s/*.v3s.json`adapter 启动时通过 `V3SCTL_MANIFEST_PATHS` 读取;manifest 是 D601/D518 实例、active instance、single writer、expected nodes 和 health policy 的权威来源。`V3SCTL_SERVICES_JSON` 不得承载 static HTTP 服务、不得覆盖同名服务、不得作为隐藏 fallback;如需追加服务也必须提供完整 `ManagedKubernetesService` manifest。
- API`GET /health` 只表示 adapter 控制面自身可用,并把代管服务 serving 健康作为 `managedServicesHealthy` 字段展示;`GET /api/control-plane` 返回控制面、manifest、kubectl/v3s snapshot 和代管服务状态;`GET /api/services` 返回代管服务列表;`GET|HEAD /api/services/<id>/health` 返回该 v3s 服务的 active serving 健康;`/api/services/<id>/proxy/*` 是业务请求进入 active service 的唯一代理入口。
- 代理路径:adapter 访问代管业务服务的唯一正式路径是 Kubernetes API service proxy`/api/v1/namespaces/<namespace>/services/<service>:<port>/proxy/...`。D601 与 D518 不要求能彼此直连;D518 加入时应优先通过 k3s/k8s 原生 agent/proxy/tunnel 能力让控制面可达该节点,必要时可用 provider 维护通道只承载控制面连接的建立和诊断,但业务请求不得退化为 provider-gateway 直连 Code Queue HTTP 端口
- 拓扑健康:`expectedNodeIds` 负责展示计划内节点D518 尚未接入时必须保留 `missingNodeIds=["D518"]``status=degraded` 可见;只要 active D601 Service 通过 Kubernetes API service proxy 返回健康,`servingHealthy=true``healthy=true``managedServicesHealthy=true` 仍应成立。只有显式 `requireAllInstancesHealthy=true` 的服务才允许把缺失 standby/worker 节点提升为整体不健康。
- 代理路径:adapter 访问 active 业务服务的唯一正式路径是 Kubernetes API service proxy`/api/v1/namespaces/<namespace>/services/<service>:<port>/proxy/...`。D601 与 D518 不要求能彼此直连;D518 通过 K3S agent 加入控制面,控制面连接可以借助节点维护隧道建立,但业务请求不得退化为 provider-gateway 直连 Code Queue HTTP 端口。standby/worker 节点如果受 kubelet/service-proxy 可达性限制,可以在 manifest 中显式使用 `healthMode=pod-ready` 作为拓扑健康探针;这只读取 Kubernetes Pod readiness,不是业务代理路径,也不能替代 active Service proxy
- 拓扑健康:`expectedNodeIds` 负责展示计划内节点;当前 Code Queue 目标拓扑必须同时包含 D601 和 D518,`presentNodeIds` 应为 `["D601","D518"]``missingNodeIds=[]``topologyComplete=true``status=healthy`。D518 未加入只允许作为迁移中的显式 degraded 状态,不能隐藏为 fallback只有显式 `requireAllInstancesHealthy=true` 的服务才允许把缺失 standby/worker 节点提升为整体不健康。
- 前端:`用户服务 / V3S Control` React 页面必须只通过 `/api/microservices/v3sctl-adapter/proxy/api/control-plane` 通信,展示控制面状态、manifest、D601/D518 实例、active instance、Kubernetes API service proxy/no-fallback 路径和显式原始 JSON 按钮;页面不得直接访问 provider-gateway、D601/D518 业务容器端口、NodePort 或 raw v3s/kubectl API。
### Code Queue V3S-Managed
@@ -148,7 +149,7 @@ Baidu Netdisk 在 UniDesk 语境中按纯后端服务管理:不得暴露百度
- 主服务依赖映射:Code Queue 仍以主 PostgreSQL 为权威数据库,`DATABASE_URL` 必须指向主 server 受限端口映射;`OA_EVENT_FLOW_BASE_URL` 必须指向主 server OA Event Flow 受限端口映射;D601 active 实例的 `CODE_QUEUE_NOTIFY_CLAUDEQQ_BASE_URL` 直接使用本机 ClaudeQQ 映射 `http://host.docker.internal:3290`。这些端口映射只服务受控节点运行时,必须用防火墙或等价策略限制来源,不得成为浏览器或任意公网客户端入口。
- K8s 探针与启动维护:Kubernetes liveness/startup probe 必须使用轻量 `/live`,readiness 和用户服务健康使用 `/health``/health` 不得执行全量任务聚合、历史回填或长事务索引维护,历史任务总览应由 `/api/tasks/overview` 读取 PostgreSQL。启动时允许后台执行队列元数据 flush、通知 outbox 读取、任务表索引维护和 overview warmup,但这些维护不得阻塞 Bun server、readiness endpoint 或 frontend overview;通知表索引和大批量 OA backfill 不得作为默认启动副作用。
- MiniMax/OpenCode 并发:`minimax-m2.7` 通过 OpenCode JSON 事件端口运行;每个 Code Queue task 必须使用独立的 OpenCode XDG data/config/cache/state 目录,禁止多队列并发任务共享同一个 OpenCode SQLite/WAL 状态目录,否则并发 smoke 会触发 `PRAGMA journal_mode = WAL` 之类的数据库锁或初始化错误。用于验证 v3s/k8s 链路的 MiniMax smoke 以“至少 4 个任务、分布到 2 个 queue、至少 2 个终态成功”为链路验收线;剩余失败如果是 OpenCode 最终回复捕获、业务任务判定或模型限流,应作为 Code Queue 执行可靠性问题单独排查,不能反推 v3s 代理链路失败。
- 默认出网代理:D601 active Code Queue Pod 必须默认把 `HTTP_PROXY``HTTPS_PROXY``ALL_PROXY` 注入给 Codex/OpenCode、`git``curl``npm` 等任务子进程;当前唯一上游是 D601 provider-gateway 通过宿主 loopback 发布的 egress HTTP CONNECT 端口 `http://host.docker.internal:18789`,该端口只允许绑定 `127.0.0.1`,不得开放公网。这里的 provider-gateway 只承担出网代理,不承担 Code Queue 业务 HTTP 代理;业务访问仍只能走 Kubernetes API service proxy。k3s/k8s 原生 egress gateway、service mesh 或 CNI egress policy 只作为后续网络层增强方向,当前交付态不引入第二套出网控制面。远程开发/执行容器不得只依赖这些环境变量,必须在容器网络层用 TUN 默认路由和 OUTPUT 防火墙强制外网流量只能经 master TUN 出口。
- 默认出网代理:D601 active Code Queue Pod 必须默认把 `HTTP_PROXY``HTTPS_PROXY``ALL_PROXY` 注入给 Codex/OpenCode、`git``curl``npm` 等任务子进程;当前唯一上游是 D601 provider-gateway egress HTTP CONNECT 代理,并通过 Kubernetes `Service d601-provider-egress-proxy` 暴露给 `unidesk` namespace 内的 Pod。该 Service 的 EndpointSlice 指向 D601 provider-gateway 私有 Docker network endpointPod 内代理 URL 使用 `http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789`provider-gateway 宿主端口只允许绑定 `127.0.0.1`,不得开放公网;如 provider-gateway 容器 IP 变化,必须同步刷新 EndpointSlice 并用 Code Queue `/health.egressProxy.connected=true` 验证。这里的 provider-gateway 只承担出网代理,不承担 Code Queue 业务 HTTP 代理;业务访问仍只能走 Kubernetes API service proxy。k3s/k8s 原生 egress gateway、service mesh 或 CNI egress policy 只作为后续网络层增强方向,当前交付态不引入第二套出网控制面。远程开发/执行容器不得只依赖这些环境变量,必须在容器网络层用 TUN 默认路由和 OUTPUT 防火墙强制外网流量只能经 master TUN 出口。
- 出网代理无 fallback 纪律:Code Queue 的运行时配置只允许一个默认出网路径,即 provider-gateway egress proxy;不得在代码中同时保留 Code Queue 自建 WebSocket proxy、临时 shell proxy、D601 本地直连公网、主 server direct HTTP proxy 等隐式分支。任何新增网络 fallback 都必须先进入本参考文档并配套 `/health` 可见状态,否则视为残留旧路径。
- 上线纪律:Code Queue 相关的前端或后端改进必须在同一任务内正式上线并验证公网 frontend 或 live API,不能只停留在源码、构建产物或“后续再上线”。修改 Code Queue 自身时不得等待当前 Code Queue task 结束、等待 queue idle 或等待 `0 running` 后才重启;应通过 v3s 控制面或 D601/D518 维护入口做 build-first 替换,并用 v3s adapter、Code Queue live API 或公网 frontend 证明任务和队列仍可读可继续。
- 更名与灾备恢复:旧版 Codex 队列服务名只允许作为兼容诊断和一次性迁移来源;`code-queue-backend` 容器自身 `/health` 正常但 `microservice health code-queue` 返回 provider 直连错误时,优先判定为 backend-core 仍加载旧 `MICROSERVICES_JSON` 或 adapter manifest 未刷新,必须刷新 `.state/docker-compose.env`、重建/替换 `backend-core``v3sctl-adapter`,随后用 `microservice list` 验证 `code-queue``runtime.orchestrator=v3sctl``backend.proxyMode=v3sctl-adapter-http` 和无业务容器直连摘要。
@@ -281,7 +282,7 @@ ClaudeQQ 在 UniDesk 语境中按消息网关后端服务管理:不得直接
- `bun scripts/cli.ts microservice health claudeqq``bun scripts/cli.ts microservice proxy claudeqq /api/napcat/login``bun scripts/cli.ts microservice proxy claudeqq /api/events/recent``bun scripts/cli.ts microservice proxy claudeqq /api/events/subscriptions`:验证 ClaudeQQ 后端、NapCat 容器登录、事件订阅和私有代理链路;消息推送使用 `POST /api/push/text`,不得开放 D601 `3290/3000/3001/6099` 公网端口。
- `bun scripts/cli.ts microservice health todo-note``bun scripts/cli.ts microservice proxy todo-note /api/instances`:验证主 server Todo Note 后端、PostgreSQL 存储和本机 provider-gateway 私有代理链路。
- `bun scripts/cli.ts microservice health oa-event-flow``bun scripts/cli.ts microservice proxy oa-event-flow /api/diagnostics --raw``bun scripts/cli.ts microservice proxy oa-event-flow '/api/events?tags=service:code-queue&limit=20' --raw`:验证统一 OA 事件流、事件表、tag 查询和统计中心。
- `bun scripts/cli.ts microservice health v3sctl-adapter``bun scripts/cli.ts microservice proxy v3sctl-adapter /api/control-plane --raw`:验证 D601 v3s 控制面 adapter、manifest、D601/D518 实例状态和 no-fallback 运行路径。
- `bun scripts/cli.ts microservice health v3sctl-adapter``bun scripts/cli.ts microservice proxy v3sctl-adapter /api/control-plane --raw`:验证 D601 `unidesk-v8s` 控制面 adapter、manifest、D601 active/D518 standby 实例状态、`presentNodeIds=[D601,D518]``missingNodeIds=[]` 和 no-fallback 运行路径。
- `bun scripts/cli.ts microservice health code-queue``bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`:验证 Code Queue 经过 backend-core -> v3sctl-adapter -> v3s active service 的单一路径;输出不得出现 `serviceId=code-queue` 的 provider-gateway `microservice.http` 业务代理任务,写入、追加 prompt、打断和 readAt/未读状态都必须由 backend 写入 PostgreSQLfrontend 不得用本地存储伪造成功状态。
- `bun scripts/cli.ts microservice health filebrowser``bun scripts/cli.ts microservice health filebrowser-d601``bun scripts/cli.ts microservice proxy filebrowser / --max-body-bytes 2000`:验证 D518 主 File Browser 和 D601 备用 File Browser 私有代理链路;浏览器 WebUI 必须通过 `/api/microservices/filebrowser/proxy/``/api/microservices/filebrowser-d601/proxy/` 访问,不得直接开放 `4251` 公网端口。
- `bun scripts/cli.ts --main-server-ip 74.48.78.17 microservice health findjob`:在计算节点或其他非主 server 主机上通过公网 frontend remote CLI 进行同一验证,不需要主 server SSH key。
@@ -309,7 +310,7 @@ ClaudeQQ 在 UniDesk 语境中按消息网关后端服务管理:不得直接
- 运行 `bun scripts/cli.ts microservice health met-nonlinear``bun scripts/cli.ts microservice proxy met-nonlinear /api/queue``bun scripts/cli.ts microservice proxy met-nonlinear '/api/projects?root=projects&limit=20'``bun scripts/cli.ts microservice proxy met-nonlinear /api/images`,确认真实链路经过 backend-core、WebSocket、D601 provider-gateway 和 D601 本机 MET Nonlinear TS 后端。
- 运行 `bun scripts/cli.ts microservice health claudeqq``bun scripts/cli.ts microservice proxy claudeqq /api/napcat/login``bun scripts/cli.ts microservice proxy claudeqq /api/events/recent``bun scripts/cli.ts microservice proxy claudeqq /api/events/subscriptions`,确认真实链路经过 backend-core、WebSocket、D601 provider-gateway 和 D601 本机 ClaudeQQ 后端;在 D601 上 `curl http://127.0.0.1:3290/health` 应显示 `service=claudeqq``pureBackend=true``napcat.containerized=true`、NapCat HTTP/WS 状态、二维码状态和订阅计数。
- 运行 `bun scripts/cli.ts microservice health todo-note``bun scripts/cli.ts microservice proxy todo-note /api/instances`,确认真实链路经过 backend-core、WebSocket、main-server provider-gateway 和主 server `todo-note-backend` 后端;输出中必须包含五个迁移清单和 PostgreSQL 存储健康状态。
- 运行 `bun scripts/cli.ts microservice health v3sctl-adapter``bun scripts/cli.ts microservice proxy v3sctl-adapter /api/control-plane --raw``bun scripts/cli.ts microservice health code-queue``bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`,确认真实链路经过 backend-core -> v3sctl-adapter -> v3s active serviceCode Queue `/health` 必须仍返回业务后端自己的 `queue.storage.primary=postgres``queue.storage.postgresReady=true``queue.notifications.claudeqq.outbox.storage=postgres`,不得被 adapter 聚合健康 JSON 替代。还必须在 active Code Queue Pod 内验证主 PostgreSQL 端口映射、主 OA Event Flow 端口映射本机 ClaudeQQ `http://host.docker.internal:3290` 均可访问,并在 adapter 控制页确认 D601 active serving healthy、D518 expected/missing 可见且整体不退化为 hidden fallback。再通过公网 frontend 提交一个 `gpt-5.5` 小任务,确认队列串行推进、输出实时更新、结束后有 judge 判定,且运行中可追加 prompt 或打断。Code Queue 的重启恢复必须作为验收项:运行中任务存在时重启或重建 active 实例后,任务必须从 PostgreSQL 恢复到可继续执行状态,不能丢失 active task、`promptHistory`、后续 queued 任务、readAt/未读状态或已入 outbox 的 ClaudeQQ 通知。Code Queue 服务名、表名前缀或持久化目录发生迁移后,还必须运行 `bun scripts/cli.ts e2e run --only microservice:catalog-code-queue,microservice:code-queue-status,microservice:code-queue-health,microservice:code-queue-tasks`,证明 backend-core catalog、v3s adapter 私有代理、PostgreSQL 队列和任务列表都指向 `code-queue`。批量验收必须通过公网 frontend 设置 `入队份数=5` 或使用多段 prompt 分隔,一次性入队 5 条任务,并确认 5 条任务按顺序进入 running/judging/succeeded,而不是只运行第一条。
- 运行 `bun scripts/cli.ts microservice health v3sctl-adapter``bun scripts/cli.ts microservice proxy v3sctl-adapter /api/control-plane --raw``bun scripts/cli.ts microservice health code-queue``bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`,确认真实链路经过 backend-core -> v3sctl-adapter -> v3s active serviceCode Queue `/health` 必须仍返回业务后端自己的 `queue.storage.primary=postgres``queue.storage.postgresReady=true``queue.notifications.claudeqq.outbox.storage=postgres``egressProxy.connected=true`,不得被 adapter 聚合健康 JSON 替代。还必须在 active Code Queue Pod 内验证主 PostgreSQL 端口映射、主 OA Event Flow 端口映射本机 ClaudeQQ `http://host.docker.internal:3290``d601-provider-egress-proxy` 均可访问,并在 adapter 控制页确认 D601 active serving healthy、D518 standby pod ready、`missingNodeIds=[]` 且整体不退化为 hidden fallback。再通过公网 frontend 提交一个 `gpt-5.5` 小任务,确认队列串行推进、输出实时更新、结束后有 judge 判定,且运行中可追加 prompt 或打断。Code Queue 的重启恢复必须作为验收项:运行中任务存在时重启或重建 active 实例后,任务必须从 PostgreSQL 恢复到可继续执行状态,不能丢失 active task、`promptHistory`、后续 queued 任务、readAt/未读状态或已入 outbox 的 ClaudeQQ 通知。Code Queue 服务名、表名前缀或持久化目录发生迁移后,还必须运行 `bun scripts/cli.ts e2e run --only microservice:catalog-code-queue,microservice:code-queue-status,microservice:code-queue-health,microservice:code-queue-tasks`,证明 backend-core catalog、v3s adapter 私有代理、PostgreSQL 队列和任务列表都指向 `code-queue`。批量验收必须通过公网 frontend 设置 `入队份数=5` 或使用多段 prompt 分隔,一次性入队 5 条任务,并确认 5 条任务按顺序进入 running/judging/succeeded,而不是只运行第一条。
- Code Queue 内存防回归验收:凡是改动 Code Queue 的持久化、scheduler、输出/Trace、health、列表/详情查询、日志导出或容器运行参数,交付前必须在 D601 用 `kubectl -n unidesk get deploy,pod,svc,endpoints -o wide``kubectl -n unidesk describe deploy/code-queue` 或等价 Docker inspect 确认 memory/swap 硬上限符合预算,运行 `kubectl -n unidesk top pod` 或 Docker stats 确认常驻内存、`OOMKilled=false``RestartCount` 未异常增长,再运行 `bun scripts/cli.ts microservice health code-queue` 确认 `/health` 是轻量 readiness 且暴露 PostgreSQL/notification/outbox 状态。验收还必须覆盖有历史任务存在时的 `/api/tasks/overview`、单任务详情和 output/transcript 查询,证明热状态裁剪不会丢历史输出、也不会重新把全部历史 `task_json` 缓存在进程内;涉及 TypeScript/frontend 验证的任务应能在 D601 Code Queue memory/swap 预算中完成 `bun run --cwd src/components/frontend check` 这类短时高内存命令,而不是被 memory watchdog 反复 SIGTERM。
- Code Queue 延迟防回归验收:凡是改动 Code Queue 列表、overview、readAt、Trace/summary 懒加载、实时 output/SSE 事件发布、frontend 请求策略、backend-core 用户服务代理或 frontend Code Queue 请求路径,交付前必须在有历史任务数据且有 active output 流动的 live 环境验证 `GET /api/tasks/overview``POST /api/tasks/<id>/read`、选定 task 的 `trace-step` 和前端 `/app/code-queue/` 首屏均低于 1s 目标;可运行 `bun scripts/src/code-queue-perf.ts --json --target-ms 1000` 采集公网 frontend 下的首屏耗时、最慢 API 和 DOM 完成指标,并用 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview --raw`、D601 Pod `/health``/api/tasks/overview` curl、性能面板 `/api/performance``/api/frontend-performance` 失败/慢操作记录、`kubectl -n unidesk top pod` 或 Docker stats 补充后端耗时、代理 502 和内存/CPU 证据。验收结论必须同时说明是否使用了短 TTL cache、cache 如何被 mutation 或 archive append 失效、数据库索引/聚合是否命中、输出热路径是否只读增量指标,以及分页加载是否跳过 selected/active/stats;不能只展示 cache 命中后的单次快照。
- 运行 `bun scripts/cli.ts microservice health filebrowser``bun scripts/cli.ts microservice health filebrowser-d601``bun scripts/cli.ts microservice proxy filebrowser / --max-body-bytes 2000`,确认 File Browser health 返回 `status=OK`WebUI HTML 包含 `File Browser`D518/D601 通过 provider-gateway 访问节点本机 `4251`;随后在公网 frontend 的 `用户服务 / File Browser` 中确认 D518 为默认目标、可导出截图、iframe 紧凑布局不再有巨大 `folder` 标记遮挡文件名,并可浏览 `/mnt/c`
+2 -2
View File
@@ -92,11 +92,11 @@ provider ingress 是唯一允许公网暴露的 provider 连接接口,当前
## Egress Proxy
provider-gateway 可以提供 egress HTTP CONNECT 代理,用于让 Code Queue、Pipeline runner 等节点侧执行环境通过既有 provider WebSocket 通道出网。代理默认监听容器内 `0.0.0.0:18789`,节点部署必须只发布为宿主 loopback `127.0.0.1:18789->18789/tcp`,不得开放公网端口;普通 Docker 执行容器可通过同一私有 Docker network 访问 provider-gateway 容器名,v3s/k8s Pod 统一通过 `host.docker.internal:18789` 访问该 loopback 映射。代理只负责把本地 CONNECT/absolute HTTP 请求转换为 `egress_tcp_open``egress_tcp_data``egress_tcp_close` 消息;backend-core 在主 server 侧建立真实 TCP 连接并把数据回传,避免 D601 等计算节点本地网络不可达时卡死 Codex/Git/NPM。
provider-gateway 可以提供 egress HTTP CONNECT 代理,用于让 Code Queue、Pipeline runner 等节点侧执行环境通过既有 provider WebSocket 通道出网。代理默认监听容器内 `0.0.0.0:18789`,节点部署必须只发布为宿主 loopback `127.0.0.1:18789->18789/tcp`,不得开放公网端口;普通 Docker 执行容器可通过同一私有 Docker network 访问 provider-gateway 容器名,v3s/k8s Pod 必须通过显式 Kubernetes Service/EndpointSlice 暴露同节点 provider-gateway 私有 endpoint,例如 D601 Code Queue 使用 `d601-provider-egress-proxy.unidesk.svc.cluster.local:18789`,不得把该 egress Service 当作业务 HTTP 入口。代理只负责把本地 CONNECT/absolute HTTP 请求转换为 `egress_tcp_open``egress_tcp_data``egress_tcp_close` 消息;backend-core 在主 server 侧建立真实 TCP 连接并把数据回传,避免 D601 等计算节点本地网络不可达时卡死 Codex/Git/NPM。
该能力属于 provider-gateway 通道能力,register/heartbeat 的 `unideskCapabilities` 必须包含 `network.egress-proxy`labels 必须上报 `providerGatewayEgressProxy*` 状态。不得再为某个用户服务单独注册伪 provider 来实现出网代理;否则节点列表会出现虚假 provider,且代理、统计、升级路径会形成多套通道。代理健康检查使用 `GET /__unidesk/egress-proxy/health`,返回 `connected``providerId``activeTunnels` 和监听端口;业务服务自己的 `/health` 应把该结果作为排障证据透出。
egress proxy 的长期边界是“统一 provider 通道,不引入第二控制面”。backend-core 只接受在线 provider socket 上的 `egress_tcp_*` 消息,并在该 socket 关闭时销毁全部对应 TCP relayprovider-gateway 只维护本地 HTTP proxy 与 WebSocket 消息映射,不保存业务状态,不参与任务调度、统计或节点注册以外的控制面。执行容器、用户服务和 Pipeline runner 不允许直接连接 backend-core provider ingress,也不允许携带 provider token 自行注册;需要出网时只能连接同节点 provider-gateway 的私有 proxy endpoint。当前 v3s/k8s Code Queue 采用 `host.docker.internal:18789`,这是节点 loopback egress 入口,不是业务 HTTP 代理入口,也不能替代 Kubernetes API service proxy。
egress proxy 的长期边界是“统一 provider 通道,不引入第二控制面”。backend-core 只接受在线 provider socket 上的 `egress_tcp_*` 消息,并在该 socket 关闭时销毁全部对应 TCP relayprovider-gateway 只维护本地 HTTP proxy 与 WebSocket 消息映射,不保存业务状态,不参与任务调度、统计或节点注册以外的控制面。执行容器、用户服务和 Pipeline runner 不允许直接连接 backend-core provider ingress,也不允许携带 provider token 自行注册;需要出网时只能连接同节点 provider-gateway 的私有 proxy endpoint。当前 v3s/k8s Code Queue 通过 `d601-provider-egress-proxy` Kubernetes Service 连接 D601 provider-gateway egress endpoint,这是 Pod 内的出网入口,不是业务 HTTP 代理入口,也不能替代 Kubernetes API service proxy。
故障语义必须显式,不允许静默 fallback。provider-gateway 到 backend-core 的 WebSocket 未连接时,本地 proxy 必须返回 503;执行容器不能自动绕过到 D601 本地直连公网、外部公共代理或主 server 公网 HTTP 端口。`NO_PROXY` 只用于 PostgreSQL、OA Event Flow、ClaudeQQ、frontend/backend-core 内网代理、provider-gateway health 等明确内网链路,不能把 GitHub、模型 API、npm registry 等外部目标加入绕过列表。验收必须同时证明 provider-gateway labels、业务服务 `/health` 和执行容器内 `curl -I https://...` 都走同一 proxy path。
+54 -2
View File
@@ -61,6 +61,8 @@ const SERVICE_CHECK_NAMES = [
"microservice:catalog-todo-note",
"microservice:catalog-oa-event-flow",
"microservice:catalog-code-queue",
"microservice:v3sctl-adapter-status",
"microservice:v3sctl-control-plane",
"microservice:catalog-filebrowser",
"microservice:filebrowser-health",
"microservice:filebrowser-webui",
@@ -1026,6 +1028,8 @@ async function serviceChecks(config: UniDeskConfig, urls: PublicUrls, checks: E2
const oaEventFlowEvents = dockerCoreJson("/api/microservices/oa-event-flow/proxy/api/events?limit=10");
const oaEventFlowPipelineEvents = dockerCoreJson("/api/microservices/oa-event-flow/proxy/api/events?tags=service:pipeline&limit=10");
const oaEventFlowStats = dockerCoreJson("/api/microservices/oa-event-flow/proxy/api/stats/trace?limit=10");
const v3sctlStatus = dockerCoreJson("/api/microservices/v3sctl-adapter/status");
const v3sctlControlPlane = dockerCoreJson("/api/microservices/v3sctl-adapter/proxy/api/control-plane");
const codeQueueStatus = dockerCoreJson("/api/microservices/code-queue/status");
const codeQueueHealth = dockerCoreJson("/api/microservices/code-queue/health");
const codeQueueTasks = dockerCoreJson("/api/microservices/code-queue/proxy/api/tasks/overview?limit=5&transcriptLimit=1&compact=1&afterSeq=0&preferId=");
@@ -1100,8 +1104,27 @@ async function serviceChecks(config: UniDeskConfig, urls: PublicUrls, checks: E2
const oaEventFlowEventsBody = (oaEventFlowEvents as { body?: { ok?: boolean; events?: unknown[]; returned?: number } }).body;
const oaEventFlowPipelineEventsBody = (oaEventFlowPipelineEvents as { body?: { ok?: boolean; events?: Array<{ tags?: unknown[]; sourceId?: string; type?: string; payload?: { runId?: string; pipelineId?: string } }>; returned?: number } }).body;
const oaEventFlowStatsBody = (oaEventFlowStats as { body?: { ok?: boolean; stats?: unknown[]; returned?: number } }).body;
const codeQueueHealthBody = (codeQueueHealth as { body?: { ok?: boolean; queue?: { defaultModel?: string; judgeConfigured?: boolean; modelReasoningEfforts?: Record<string, string> } } }).body;
const codeQueueHealthBody = (codeQueueHealth as { body?: { ok?: boolean; egressProxy?: { connected?: boolean }; queue?: { defaultModel?: string; judgeConfigured?: boolean; modelReasoningEfforts?: Record<string, string> } } }).body;
const codeQueueTasksBody = (codeQueueTasks as { body?: { ok?: boolean; queue?: { defaultModel?: string; modelReasoningEfforts?: Record<string, string> }; tasks?: unknown[] } }).body;
const v3sctlControlPlaneBody = (v3sctlControlPlane as { body?: {
ok?: boolean;
clusterId?: string;
noFallback?: boolean;
managedServicesHealthy?: boolean;
kubeApiProxy?: { mode?: string };
services?: Array<{
id?: string;
status?: string;
presentNodeIds?: string[];
missingNodeIds?: string[];
topologyComplete?: boolean;
servingHealthy?: boolean;
active?: { id?: string; healthy?: boolean };
instances?: Array<{ id?: string; healthy?: boolean; proxyMode?: string }>;
}>;
} }).body;
const v3sctlCodeQueueService = v3sctlControlPlaneBody?.services?.find((service) => service.id === "code-queue");
const v3sctlD518Instance = v3sctlCodeQueueService?.instances?.find((instance) => instance.id === "D518");
const filebrowserHealthBody = (filebrowserHealth as { body?: { status?: string } }).body;
const filebrowserD601HealthBody = (filebrowserD601Health as { body?: { status?: string } }).body;
const filebrowserWebuiText = String((filebrowserWebui as { body?: { text?: string } }).body?.text || "");
@@ -1141,6 +1164,35 @@ async function serviceChecks(config: UniDeskConfig, urls: PublicUrls, checks: E2
&& codeQueue.runtime?.orchestrator === "v3sctl"
&& codeQueue.runtime?.container === null,
{ microservices });
addSelectedCheck(checks, options, "microservice:v3sctl-adapter-status",
(v3sctlStatus as { ok?: boolean; body?: { microservice?: { id?: string; providerId?: string } } }).ok === true
&& (v3sctlStatus as { body?: { microservice?: { id?: string; providerId?: string } } }).body?.microservice?.id === "v3sctl-adapter"
&& (v3sctlStatus as { body?: { microservice?: { id?: string; providerId?: string } } }).body?.microservice?.providerId === "D601",
v3sctlStatus);
addSelectedCheck(checks, options, "microservice:v3sctl-control-plane",
(v3sctlControlPlane as { ok?: boolean }).ok === true
&& v3sctlControlPlaneBody?.ok === true
&& v3sctlControlPlaneBody.clusterId === "unidesk-v8s"
&& v3sctlControlPlaneBody.noFallback === true
&& v3sctlControlPlaneBody.managedServicesHealthy === true
&& v3sctlControlPlaneBody.kubeApiProxy?.mode === "kubernetes-api-service-proxy"
&& v3sctlCodeQueueService?.status === "healthy"
&& v3sctlCodeQueueService?.topologyComplete === true
&& v3sctlCodeQueueService?.servingHealthy === true
&& v3sctlCodeQueueService?.active?.id === "D601"
&& v3sctlCodeQueueService?.active?.healthy === true
&& (v3sctlCodeQueueService?.presentNodeIds ?? []).includes("D601")
&& (v3sctlCodeQueueService?.presentNodeIds ?? []).includes("D518")
&& (v3sctlCodeQueueService?.missingNodeIds ?? []).length === 0
&& v3sctlD518Instance?.healthy === true
&& v3sctlD518Instance?.proxyMode === "kubernetes-api-pod-readiness",
{
ok: (v3sctlControlPlane as { ok?: boolean }).ok,
clusterId: v3sctlControlPlaneBody?.clusterId,
noFallback: v3sctlControlPlaneBody?.noFallback,
kubeApiProxy: v3sctlControlPlaneBody?.kubeApiProxy,
service: v3sctlCodeQueueService,
});
addSelectedCheck(checks, options, "microservice:catalog-filebrowser", (microservices as { ok?: boolean }).ok === true
&& filebrowser?.providerId === "D518"
&& filebrowser.backend?.public === false
@@ -1209,7 +1261,7 @@ async function serviceChecks(config: UniDeskConfig, urls: PublicUrls, checks: E2
});
addSelectedCheck(checks, options, "microservice:oa-event-flow-stats", (oaEventFlowStats as { ok?: boolean }).ok === true && oaEventFlowStatsBody?.ok === true && Array.isArray(oaEventFlowStatsBody.stats), oaEventFlowStats);
addSelectedCheck(checks, options, "microservice:code-queue-status", (codeQueueStatus as { ok?: boolean }).ok === true && (codeQueueStatus as { body?: { microservice?: { id?: string; providerId?: string } } }).body?.microservice?.providerId === "D601", codeQueueStatus);
addSelectedCheck(checks, options, "microservice:code-queue-health", (codeQueueHealth as { ok?: boolean }).ok === true && codeQueueHealthBody?.ok === true && codeQueueHealthBody.queue?.defaultModel === "gpt-5.5" && codeQueueHealthBody.queue?.modelReasoningEfforts?.["gpt-5.5"] === "xhigh", codeQueueHealth);
addSelectedCheck(checks, options, "microservice:code-queue-health", (codeQueueHealth as { ok?: boolean }).ok === true && codeQueueHealthBody?.ok === true && codeQueueHealthBody.egressProxy?.connected === true && codeQueueHealthBody.queue?.defaultModel === "gpt-5.5" && codeQueueHealthBody.queue?.modelReasoningEfforts?.["gpt-5.5"] === "xhigh", codeQueueHealth);
addSelectedCheck(checks, options, "microservice:code-queue-tasks", (codeQueueTasks as { ok?: boolean }).ok === true && codeQueueTasksBody?.ok === true && Array.isArray(codeQueueTasksBody.tasks) && codeQueueTasksBody.queue?.defaultModel === "gpt-5.5" && codeQueueTasksBody.queue?.modelReasoningEfforts?.["gpt-5.5"] === "xhigh", codeQueueTasks);
const upgradeDispatch = dockerCoreJson("/api/dispatch", {
method: "POST",
@@ -17,18 +17,18 @@ services:
HOST: "0.0.0.0"
PORT: "4266"
LOG_FILE: "/var/log/unidesk/v3sctl-adapter.jsonl"
V3SCTL_CLUSTER_ID: "${V3SCTL_CLUSTER_ID:-D601}"
V3SCTL_CLUSTER_ID: "${V3SCTL_CLUSTER_ID:-unidesk-v8s}"
V3SCTL_NODE_ID: "${V3SCTL_NODE_ID:-D601}"
V3SCTL_KUBECTL_ENABLED: "${V3SCTL_KUBECTL_ENABLED:-false}"
V3SCTL_KUBE_API_PROXY_ENABLED: "${V3SCTL_KUBE_API_PROXY_ENABLED:-true}"
V3SCTL_KUBECONFIG_PATH: "/var/lib/unidesk/v3s/kubeconfig"
V3SCTL_KUBECONFIG_PATH: "/var/lib/unidesk/v8s/kubeconfig"
V3SCTL_KUBE_API_CONNECT_HOST: "${V3SCTL_KUBE_API_CONNECT_HOST:-host.docker.internal}"
V3SCTL_MANIFEST_PATHS: "${V3SCTL_MANIFEST_PATHS:-v3s/code-queue.v3s.json}"
V3SCTL_SERVICES_JSON: "${V3SCTL_SERVICES_JSON:-[]}"
UNIDESK_LOG_RETENTION_BYTES: "${UNIDESK_LOG_RETENTION_BYTES:-512MiB}"
volumes:
- ${V3SCTL_ADAPTER_LOG_DIR:-../../../../.state/v3sctl-adapter/logs}:/var/log/unidesk
- ${V3SCTL_KUBECONFIG_HOST_PATH:-../../../../.state/v3s/kubeconfig}:/var/lib/unidesk/v3s/kubeconfig:ro
- ${V3SCTL_KUBECONFIG_HOST_PATH:-../../../../.state/v8s/kubeconfig}:/var/lib/unidesk/v8s/kubeconfig:ro
extra_hosts:
- "host.docker.internal:host-gateway"
networks:
@@ -8,6 +8,7 @@ type JsonValue = string | number | boolean | null | JsonValue[] | { [key: string
type JsonRecord = Record<string, JsonValue>;
type InstanceRole = "primary" | "standby" | "worker";
type EndpointHealthMode = "service-proxy" | "pod-ready";
interface ManagedEndpoint {
id: string;
@@ -15,6 +16,7 @@ interface ManagedEndpoint {
role: InstanceRole;
baseUrl: string;
healthPath: string;
healthMode: EndpointHealthMode;
}
interface ManagedService {
@@ -143,6 +145,11 @@ function normalizeRole(value: string): InstanceRole {
return "worker";
}
function normalizeHealthMode(value: string): EndpointHealthMode {
if (value === "service-proxy" || value === "pod-ready") return value;
return "service-proxy";
}
function parseEndpoint(value: unknown, index: number, ownerPath = "endpoint"): ManagedEndpoint {
const path = `${ownerPath}[${index}]`;
const item = asRecord(value, path);
@@ -154,6 +161,7 @@ function parseEndpoint(value: unknown, index: number, ownerPath = "endpoint"): M
role: normalizeRole(optionalStringField(item, "role", id === "D601" ? "primary" : "standby")),
baseUrl: stringField(item, "baseUrl", path).replace(/\/+$/u, ""),
healthPath: optionalStringField(item, "healthPath", "/health"),
healthMode: normalizeHealthMode(optionalStringField(item, "healthMode", "service-proxy")),
};
}
@@ -244,12 +252,12 @@ function readConfig(): RuntimeConfig {
port: envNumber("PORT", 4266),
logFile: envString("LOG_FILE", "/var/log/unidesk/v3sctl-adapter.jsonl"),
manifestPaths: paths,
clusterId: envString("V3SCTL_CLUSTER_ID", "D601"),
clusterId: envString("V3SCTL_CLUSTER_ID", "unidesk-v8s"),
nodeId: envString("V3SCTL_NODE_ID", "D601"),
kubectlEnabled: envBool("V3SCTL_KUBECTL_ENABLED", false),
kubectlContext: envString("V3SCTL_KUBECTL_CONTEXT", ""),
kubeApiProxyEnabled: envBool("V3SCTL_KUBE_API_PROXY_ENABLED", true),
kubeconfigPath: envString("V3SCTL_KUBECONFIG_PATH", "/var/lib/unidesk/v3s/kubeconfig"),
kubeconfigPath: envString("V3SCTL_KUBECONFIG_PATH", "/var/lib/unidesk/v8s/kubeconfig"),
kubeApiConnectHost: envString("V3SCTL_KUBE_API_CONNECT_HOST", "host.docker.internal"),
requestTimeoutMs: Math.max(1000, Math.min(120_000, envNumber("V3SCTL_REQUEST_TIMEOUT_MS", 30_000))),
healthTimeoutMs: Math.max(500, Math.min(30_000, envNumber("V3SCTL_HEALTH_TIMEOUT_MS", 2500))),
@@ -385,6 +393,23 @@ function serviceProxyApiPath(service: ManagedService, targetPath: string): strin
return `/api/v1/namespaces/${encodeURIComponent(service.namespace)}/services/${encodeURIComponent(`${serviceName}:${servicePort}`)}/proxy${safeTargetPath}`;
}
function endpointProxyApiPath(service: ManagedService, endpoint: ManagedEndpoint, targetPath: string): string {
const { namespace, serviceRef } = kubernetesEndpointServiceRef(service, endpoint);
const safeTargetPath = targetPath.startsWith("/") ? targetPath : `/${targetPath}`;
return `/api/v1/namespaces/${encodeURIComponent(namespace)}/services/${encodeURIComponent(serviceRef)}/proxy${safeTargetPath}`;
}
function kubernetesEndpointServiceRef(service: ManagedService, endpoint: ManagedEndpoint): { namespace: string; serviceRef: string } {
const base = new URL(endpoint.baseUrl);
if (base.protocol !== "kubernetes:") throw new Error(`endpoint ${endpoint.id} must use kubernetes:// baseUrl`);
const namespace = base.hostname || service.namespace;
const parts = base.pathname.split("/").filter(Boolean);
if (parts.length !== 2 || parts[0] !== "services" || parts[1].length === 0) {
throw new Error(`endpoint ${endpoint.id} baseUrl must be kubernetes://<namespace>/services/<service>:<port>`);
}
return { namespace, serviceRef: parts[1] };
}
function kubeProxyCurlArgs(client: KubeApiClient, method: string, url: URL, headers: Headers, hasBody: boolean, timeoutMs: number): string[] {
const args = [
"-sS",
@@ -431,11 +456,32 @@ async function kubeApiServiceProxyResponse(
targetPath: string,
query: string,
timeoutMs: number,
): Promise<Response> {
return kubeApiProxyResponse(service, req, serviceProxyApiPath(service, targetPath), query, timeoutMs);
}
async function kubeApiEndpointProxyResponse(
service: ManagedService,
endpoint: ManagedEndpoint,
req: Request,
targetPath: string,
query: string,
timeoutMs: number,
): Promise<Response> {
return kubeApiProxyResponse(service, req, endpointProxyApiPath(service, endpoint, targetPath), query, timeoutMs);
}
async function kubeApiProxyResponse(
service: ManagedService,
req: Request,
apiPath: string,
query: string,
timeoutMs: number,
): Promise<Response> {
if (kubeClient === null) {
return jsonResponse({ ok: false, error: "kubernetes api proxy is not configured", serviceId: service.id, kubeconfigPath: config.kubeconfigPath, noFallback: true }, 502);
}
const upstreamUrl = new URL(serviceProxyApiPath(service, targetPath), kubeClient.serverUrl);
const upstreamUrl = new URL(apiPath, kubeClient.serverUrl);
upstreamUrl.search = query;
const headers = forwardHeaders(req);
const bodyText = req.method === "GET" || req.method === "HEAD" ? "" : await req.text();
@@ -455,7 +501,7 @@ async function kubeApiServiceProxyResponse(
proc.exited,
]);
if (exitCode !== 0) {
log("error", "kube_api_proxy_failed", { serviceId: service.id, targetPath, exitCode, stderr: stderr.slice(0, 2000), noFallback: true });
log("error", "kube_api_proxy_failed", { serviceId: service.id, apiPath, exitCode, stderr: stderr.slice(0, 2000), noFallback: true });
return jsonResponse({ ok: false, error: "kubernetes api service proxy failed", serviceId: service.id, detail: stderr.slice(0, 4000), noFallback: true }, 502);
}
const parsed = parseCurlHeaderBody(Buffer.from(stdout));
@@ -522,13 +568,27 @@ async function probeEndpoint(endpoint: ManagedEndpoint): Promise<JsonRecord> {
async function probeKubernetesServiceActive(service: ManagedService): Promise<JsonRecord> {
const endpoint = activeEndpoint(service);
return probeKubernetesEndpoint(service, endpoint, true);
}
async function probeKubernetesEndpoint(service: ManagedService, endpoint: ManagedEndpoint, active = false): Promise<JsonRecord> {
if (!active && endpoint.healthMode === "pod-ready") return await probeKubernetesPodReady(service, endpoint);
const checkedAt = new Date().toISOString();
const response = await kubeApiServiceProxyResponse(
const response = active
? await kubeApiServiceProxyResponse(
service,
new Request("http://v3sctl-adapter.local/health", { method: "GET", headers: { accept: "application/json" } }),
endpoint.healthPath,
"",
config.healthTimeoutMs,
)
: await kubeApiEndpointProxyResponse(
service,
endpoint,
new Request("http://v3sctl-adapter.local/health", { method: "GET", headers: { accept: "application/json" } }),
endpoint.healthPath,
"",
config.healthTimeoutMs,
);
const contentType = response.headers.get("content-type") ?? "application/octet-stream";
const bodyText = await response.text();
@@ -544,6 +604,7 @@ async function probeKubernetesServiceActive(service: ManagedService): Promise<Js
role: endpoint.role,
baseUrl: endpoint.baseUrl,
healthPath: endpoint.healthPath,
healthMode: endpoint.healthMode,
proxyMode: "kubernetes-api-service-proxy",
route: service.route,
healthy: response.ok,
@@ -555,9 +616,79 @@ async function probeKubernetesServiceActive(service: ManagedService): Promise<Js
};
}
function jsonAtPath(value: unknown, path: string): unknown {
return path.split(".").reduce((current, key) => {
if (typeof current !== "object" || current === null) return undefined;
return (current as Record<string, unknown>)[key];
}, value);
}
function podReady(item: unknown): boolean {
const conditions = jsonAtPath(item, "status.conditions");
return Array.isArray(conditions) && conditions.some((condition) => {
const record = typeof condition === "object" && condition !== null ? condition as Record<string, unknown> : {};
return record.type === "Ready" && record.status === "True";
});
}
function podSummary(item: unknown): JsonRecord {
const metadata = typeof jsonAtPath(item, "metadata") === "object" && jsonAtPath(item, "metadata") !== null ? jsonAtPath(item, "metadata") as Record<string, unknown> : {};
return {
name: typeof metadata.name === "string" ? metadata.name : "",
nodeName: typeof jsonAtPath(item, "spec.nodeName") === "string" ? jsonAtPath(item, "spec.nodeName") as string : "",
phase: typeof jsonAtPath(item, "status.phase") === "string" ? jsonAtPath(item, "status.phase") as string : "",
podIP: typeof jsonAtPath(item, "status.podIP") === "string" ? jsonAtPath(item, "status.podIP") as string : "",
ready: podReady(item),
};
}
async function probeKubernetesPodReady(service: ManagedService, endpoint: ManagedEndpoint): Promise<JsonRecord> {
const checkedAt = new Date().toISOString();
const { namespace } = kubernetesEndpointServiceRef(service, endpoint);
const labelSelector = new URLSearchParams({
labelSelector: `app.kubernetes.io/name=${service.id},unidesk.ai/instance-id=${endpoint.id}`,
}).toString();
const response = await kubeApiProxyResponse(
service,
new Request("http://v3sctl-adapter.local/api/pods", { method: "GET", headers: { accept: "application/json" } }),
`/api/v1/namespaces/${encodeURIComponent(namespace)}/pods`,
`?${labelSelector}`,
config.healthTimeoutMs,
);
const contentType = response.headers.get("content-type") ?? "application/octet-stream";
const bodyText = await response.text();
let body: JsonValue = bodyText.slice(0, 2000);
let pods: JsonRecord[] = [];
try {
const parsed = JSON.parse(bodyText) as JsonRecord;
const items = Array.isArray(parsed.items) ? parsed.items : [];
pods = items.map(podSummary);
body = { itemCount: items.length, pods };
} catch {
// Keep the raw text preview below.
}
const healthy = response.ok && pods.some((pod) => pod.ready === true);
return {
id: endpoint.id,
nodeId: endpoint.nodeId,
role: endpoint.role,
baseUrl: endpoint.baseUrl,
healthPath: endpoint.healthPath,
healthMode: endpoint.healthMode,
proxyMode: "kubernetes-api-pod-readiness",
route: service.route,
healthy,
status: healthy ? "healthy" : "unhealthy",
upstreamStatus: response.status,
contentType,
checkedAt,
body,
};
}
async function serviceStatus(service: ManagedService): Promise<JsonRecord> {
const instances = isKubernetesServiceRoute(service)
? [await probeKubernetesServiceActive(service)]
? await Promise.all(service.endpoints.map((endpoint) => endpoint.id === service.activeInstanceId ? probeKubernetesServiceActive(service) : probeKubernetesEndpoint(service, endpoint)))
: [{
id: service.activeInstanceId,
nodeId: activeEndpoint(service).nodeId,
@@ -576,7 +707,7 @@ async function serviceStatus(service: ManagedService): Promise<JsonRecord> {
const activeHealthy = active?.healthy === true;
const allInstancesHealthy = instances.every((item) => item.healthy === true);
const expectedNodeIds = service.expectedNodeIds;
const presentNodeIds = Array.from(new Set(instances.map((item) => String(item.nodeId))));
const presentNodeIds = Array.from(new Set(instances.filter((item) => item.healthy === true).map((item) => String(item.nodeId))));
const missingNodeIds = expectedNodeIds.filter((nodeId) => !presentNodeIds.includes(nodeId));
const topologyComplete = missingNodeIds.length === 0;
const requiredTopologyHealthy = !service.requireAllInstancesHealthy || (topologyComplete && allInstancesHealthy);
@@ -4,7 +4,43 @@ metadata:
name: unidesk
labels:
app.kubernetes.io/part-of: unidesk
unidesk.ai/v3s-cluster: unidesk-v3s
unidesk.ai/v3s-cluster: unidesk-v8s
---
apiVersion: v1
kind: Service
metadata:
name: d601-provider-egress-proxy
namespace: unidesk
labels:
app.kubernetes.io/name: provider-egress-proxy
app.kubernetes.io/part-of: unidesk
unidesk.ai/provider-id: D601
spec:
type: ClusterIP
ports:
- name: http
port: 18789
targetPort: 18789
protocol: TCP
---
apiVersion: discovery.k8s.io/v1
kind: EndpointSlice
metadata:
name: d601-provider-egress-proxy
namespace: unidesk
labels:
kubernetes.io/service-name: d601-provider-egress-proxy
app.kubernetes.io/name: provider-egress-proxy
app.kubernetes.io/part-of: unidesk
unidesk.ai/provider-id: D601
addressType: IPv4
ports:
- name: http
protocol: TCP
port: 18789
endpoints:
- addresses:
- "172.25.0.3"
---
apiVersion: apps/v1
kind: Deployment
@@ -31,6 +67,8 @@ spec:
unidesk.ai/instance-id: D601
unidesk.ai/node-id: D601
spec:
nodeSelector:
unidesk.ai/node-id: D601
terminationGracePeriodSeconds: 30
containers:
- name: code-queue
@@ -99,25 +137,25 @@ spec:
- name: CODE_QUEUE_EGRESS_PROXY_ENABLED
value: "true"
- name: CODE_QUEUE_EGRESS_PROXY_URL
value: "http://host.docker.internal:18789"
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
- name: CODE_QUEUE_EGRESS_PROXY_NO_PROXY
value: "localhost,127.0.0.1,::1,host.docker.internal,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
value: "localhost,127.0.0.1,::1,host.docker.internal,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,172.25.0.3,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
- name: HTTP_PROXY
value: "http://host.docker.internal:18789"
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
- name: HTTPS_PROXY
value: "http://host.docker.internal:18789"
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
- name: ALL_PROXY
value: "http://host.docker.internal:18789"
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
- name: http_proxy
value: "http://host.docker.internal:18789"
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
- name: https_proxy
value: "http://host.docker.internal:18789"
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
- name: all_proxy
value: "http://host.docker.internal:18789"
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
- name: NO_PROXY
value: "localhost,127.0.0.1,::1,host.docker.internal,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
value: "localhost,127.0.0.1,::1,host.docker.internal,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,172.25.0.3,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
- name: no_proxy
value: "localhost,127.0.0.1,::1,host.docker.internal,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
value: "localhost,127.0.0.1,::1,host.docker.internal,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,172.25.0.3,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
- name: OA_EVENT_FLOW_BASE_URL
value: "http://74.48.78.17:4255"
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_ENABLED
@@ -226,3 +264,228 @@ spec:
- name: http
port: 4222
targetPort: http
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: code-queue-d518
namespace: unidesk
labels:
app.kubernetes.io/name: code-queue
app.kubernetes.io/part-of: unidesk
unidesk.ai/deployment-mode: v3sctl-managed
unidesk.ai/instance-id: D518
spec:
replicas: 1
selector:
matchLabels:
app.kubernetes.io/name: code-queue
unidesk.ai/instance-id: D518
template:
metadata:
labels:
app.kubernetes.io/name: code-queue
app.kubernetes.io/part-of: unidesk
unidesk.ai/deployment-mode: v3sctl-managed
unidesk.ai/instance-id: D518
unidesk.ai/node-id: D518
spec:
nodeSelector:
unidesk.ai/node-id: D518
terminationGracePeriodSeconds: 30
containers:
- name: code-queue
image: unidesk-code-queue:d601
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 4222
envFrom:
- secretRef:
name: code-queue-env
optional: true
env:
- name: HOST
value: "0.0.0.0"
- name: PORT
value: "4222"
- name: CODE_QUEUE_INSTANCE_ID
value: "D518"
- name: CODE_QUEUE_SCHEDULER_ENABLED
value: "false"
- name: CODE_QUEUE_STARTUP_OA_BACKFILL_ENABLED
value: "false"
- name: CODE_QUEUE_DATA_DIR
value: "/var/lib/unidesk/code-queue"
- name: CODE_QUEUE_WORKDIR
value: "/workspace"
- name: CODE_QUEUE_CODEX_HOME
value: "/var/lib/unidesk/code-queue/codex-home"
- name: CODE_QUEUE_OPENCODE_XDG_DIR
value: "/var/lib/unidesk/code-queue/opencode-xdg"
- name: CODE_QUEUE_SOURCE_CODEX_CONFIG
value: "/root/.codex/config.toml"
- name: CODE_QUEUE_DEFAULT_MODEL
value: "gpt-5.5"
- name: CODE_QUEUE_MODELS
value: "gpt-5.5,gpt-5.4-mini,gpt-5.4,minimax-m2.7"
- name: CODE_QUEUE_MODEL_REASONING_EFFORTS
value: "gpt-5.5=xhigh"
- name: CODE_QUEUE_SANDBOX
value: "danger-full-access"
- name: CODE_QUEUE_APPROVAL_POLICY
value: "never"
- name: CODE_QUEUE_MAX_ACTIVE_QUEUES
value: "0"
- name: CODE_QUEUE_DATABASE_POOL_MAX
value: "2"
- name: NODE_OPTIONS
value: "--max-old-space-size=1024"
- name: CODE_QUEUE_IN_MEMORY_OUTPUT_RECORDS
value: "10"
- name: CODE_QUEUE_IN_MEMORY_EVENT_RECORDS
value: "10"
- name: CODE_QUEUE_MAIN_PROVIDER_ID
value: "D518"
- name: CODE_QUEUE_REMOTE_WORKDIR
value: "/home/ubuntu"
- name: CODE_QUEUE_EXECUTION_PROVIDER_IDS
value: "D518"
- name: CODE_QUEUE_DEV_CONTAINER_MASTER_HOST
value: "74.48.78.17"
- name: CODE_QUEUE_DEV_CONTAINER_DEFAULT_PROVIDER_ID
value: "D518"
- name: CODE_QUEUE_DEV_CONTAINER_WORKDIR
value: "/home/ubuntu"
- name: CODE_QUEUE_EGRESS_PROXY_ENABLED
value: "false"
- name: CODE_QUEUE_EGRESS_PROXY_URL
value: ""
- name: CODE_QUEUE_EGRESS_PROXY_NO_PROXY
value: "localhost,127.0.0.1,::1,host.docker.internal,74.48.78.17,backend-core,oa-event-flow,database"
- name: HTTP_PROXY
value: ""
- name: HTTPS_PROXY
value: ""
- name: ALL_PROXY
value: ""
- name: http_proxy
value: ""
- name: https_proxy
value: ""
- name: all_proxy
value: ""
- name: NO_PROXY
value: "localhost,127.0.0.1,::1,host.docker.internal,74.48.78.17,backend-core,oa-event-flow,database"
- name: no_proxy
value: "localhost,127.0.0.1,::1,host.docker.internal,74.48.78.17,backend-core,oa-event-flow,database"
- name: OA_EVENT_FLOW_BASE_URL
value: "http://74.48.78.17:4255"
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_ENABLED
value: "false"
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_BASE_URL
value: ""
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_TARGET_TYPE
value: "private"
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_USER_ID
value: "645275593"
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_MAX_RESPONSE_CHARS
value: "12000"
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_TIMEOUT_MS
value: "15000"
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_SEND_ATTEMPTS
value: "3"
- name: LOG_FILE
value: "/var/log/unidesk/code-queue-d518.jsonl"
- name: UNIDESK_LOG_RETENTION_BYTES
value: "1GiB"
volumeMounts:
- name: docker-sock
mountPath: /var/run/docker.sock
- name: workspace
mountPath: /workspace
- name: workspace
mountPath: /root/unidesk
- name: codex-config
mountPath: /root/.codex/config.toml
readOnly: true
- name: ssh-dir
mountPath: /root/.ssh
readOnly: true
- name: logs
mountPath: /var/log/unidesk
- name: state
mountPath: /var/lib/unidesk/code-queue
readinessProbe:
httpGet:
path: /health
port: http
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 20
livenessProbe:
httpGet:
path: /health
port: http
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 6
startupProbe:
httpGet:
path: /health
port: http
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 60
resources:
requests:
cpu: 250m
memory: 512Mi
limits:
memory: 4Gi
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
type: Socket
- name: workspace
hostPath:
path: /home/ubuntu/cq-deploy
type: Directory
- name: codex-config
hostPath:
path: /home/ubuntu/.codex/config.toml
type: File
- name: ssh-dir
hostPath:
path: /home/ubuntu/.ssh
type: Directory
- name: logs
hostPath:
path: /home/ubuntu/cq-deploy/.state/code-queue/logs
type: DirectoryOrCreate
- name: state
hostPath:
path: /home/ubuntu/cq-deploy/.state/code-queue
type: DirectoryOrCreate
---
apiVersion: v1
kind: Service
metadata:
name: code-queue-d518
namespace: unidesk
labels:
app.kubernetes.io/name: code-queue
app.kubernetes.io/part-of: unidesk
unidesk.ai/deployment-mode: v3sctl-managed
unidesk.ai/instance-id: D518
spec:
type: ClusterIP
selector:
app.kubernetes.io/name: code-queue
unidesk.ai/instance-id: D518
ports:
- name: http
port: 4222
targetPort: http
@@ -9,8 +9,8 @@
"adapterServiceId": "v3sctl-adapter",
"controlPlane": {
"type": "kubernetes",
"cluster": "unidesk-v3s",
"context": "kind-unidesk-v3s"
"cluster": "unidesk-v8s",
"context": "unidesk-v8s"
},
"route": {
"kind": "kubernetes-service",
@@ -29,7 +29,16 @@
"nodeId": "D601",
"role": "primary",
"baseUrl": "kubernetes://unidesk/services/code-queue:4222",
"healthPath": "/health"
"healthPath": "/health",
"healthMode": "service-proxy"
},
{
"id": "D518",
"nodeId": "D518",
"role": "standby",
"baseUrl": "kubernetes://unidesk/services/code-queue-d518:4222",
"healthPath": "/health",
"healthMode": "pod-ready"
}
],
"requireAllInstancesHealthy": false