feat(v3s): add D518 code queue standby pod
This commit is contained in:
@@ -50,11 +50,11 @@ frontend 的 Docker 上线顺序为:先运行必要的本地校验,例如 `b
|
||||
|
||||
## Health Criteria
|
||||
|
||||
服务跑通的最低标准是:backend-core 内网 `/health` 返回 ok,frontend 公网 `/health` 返回 ok,provider ingress 公网 `/health` 返回 ok,database 在容器内 `pg_isready` 可用,Todo Note 后端 `/api/health` 返回 `storage=postgres`,`v3sctl-adapter` `/api/control-plane` 可见 Kubernetes API service proxy 状态、D601 active serving healthy、D518 expected/missing 节点状态和 no-fallback 路径,Code Queue `/health` 经 v3s active Service 返回轻量 readiness、默认模型和 `queue.storage`,`/api/tasks/overview` 返回 PostgreSQL 队列总览,Project Manager `/health` 返回 `storage.primary=postgres` 和项目数量,backend-core `/api/performance` 返回性能指标,`/api/nodes` 中出现 `main-server` 和 `D601` provider 且状态为 `online`,`/api/nodes/system-status` 中出现对应 CPU/内存/硬盘采样,`/api/nodes/docker-status` 中能看到主 server 与 D601 Docker 快照。D518 未接入前不得用 D601->D518 直连或 NodePort 绕过,也不得把 D518 missing 当作 D601 active Service 不可用;接入验收应证明 D518 通过 k3s/k8s 原生 agent/proxy/tunnel 或显式 provider 维护代理进入控制面。交付前还必须运行 `bun scripts/cli.ts e2e run`,并以 `docs/reference/e2e.md` 的门禁作为最终判定。
|
||||
服务跑通的最低标准是:backend-core 内网 `/health` 返回 ok,frontend 公网 `/health` 返回 ok,provider ingress 公网 `/health` 返回 ok,database 在容器内 `pg_isready` 可用,Todo Note 后端 `/api/health` 返回 `storage=postgres`,`v3sctl-adapter` `/api/control-plane` 可见 `unidesk-v8s` Kubernetes API service proxy 状态、D601 active serving healthy、D518 standby pod ready、`presentNodeIds=[D601,D518]`、`missingNodeIds=[]` 和 no-fallback 路径,Code Queue `/health` 经 v3s active Service 返回轻量 readiness、默认模型、`queue.storage` 和 `egressProxy.connected=true`,`/api/tasks/overview` 返回 PostgreSQL 队列总览,Project Manager `/health` 返回 `storage.primary=postgres` 和项目数量,backend-core `/api/performance` 返回性能指标,`/api/nodes` 中出现 `main-server`、`D601` 和 `D518` provider 且状态为 `online`,`/api/nodes/system-status` 中出现对应 CPU/内存/硬盘采样,`/api/nodes/docker-status` 中能看到主 server、D601 与 D518 Docker 快照。D518 必须通过 K3S agent 加入 V8S 控制面并运行 `code-queue-d518` standby Pod;不得用 D601->D518 直连、NodePort 或 provider-gateway business HTTP 绕过 Kubernetes service route。交付前还必须运行 `bun scripts/cli.ts e2e run`,并以 `docs/reference/e2e.md` 的门禁作为最终判定。
|
||||
|
||||
## Code Queue D601 Resource Budget
|
||||
|
||||
Code Queue 已从主 server 迁移到 D601 v3s/k8s,但仍必须保持明确的 memory/swap 硬上限,默认 `CODE_QUEUE_MAX_ACTIVE_QUEUES=0` 以恢复 queue 间并行,仍保持 `CODE_QUEUE_IN_MEMORY_OUTPUT_RECORDS=10`、`CODE_QUEUE_IN_MEMORY_EVENT_RECORDS=10` 这类小热窗口;任务历史、队列统计和 Trace/output 读取必须优先从 PostgreSQL 直读或聚合,`/health` 只做轻量 readiness,不能为了性能便利在 Bun 进程内缓存全量历史。任何提高 Code Queue 热窗口、日志缓冲、Playwright/Codex 子进程常驻规模或容器上限的变更,或把 `CODE_QUEUE_MAX_ACTIVE_QUEUES` 显式改成正数,都必须在同一任务里说明 D601 资源预算来源,并通过 D601 `kubectl -n unidesk get deploy,svc,pod`、`kubectl -n unidesk top pod` 或等价 Docker stats、`microservice health code-queue` 和对应 E2E 证明未重新引入内存爆炸风险。
|
||||
Code Queue 已从主 server 迁移到 D601 v3s/k8s,但仍必须保持明确的 memory/swap 硬上限,默认 `CODE_QUEUE_MAX_ACTIVE_QUEUES=0` 以恢复 queue 间并行,仍保持 `CODE_QUEUE_IN_MEMORY_OUTPUT_RECORDS=10`、`CODE_QUEUE_IN_MEMORY_EVENT_RECORDS=10` 这类小热窗口;任务历史、队列统计和 Trace/output 读取必须优先从 PostgreSQL 直读或聚合,`/health` 只做轻量 readiness,不能为了性能便利在 Bun 进程内缓存全量历史。任何提高 Code Queue 热窗口、日志缓冲、Playwright/Codex 子进程常驻规模或容器上限的变更,或把 `CODE_QUEUE_MAX_ACTIVE_QUEUES` 显式改成正数,都必须在同一任务里说明 D601 资源预算来源,并通过 D601 `KUBECONFIG=/home/ubuntu/unidesk-code-queue-deploy/.state/v8s/kubeconfig kubectl -n unidesk get deploy,svc,pod`、`kubectl -n unidesk top pod` 或等价 Docker stats、`microservice health code-queue` 和对应 E2E 证明未重新引入内存爆炸风险。
|
||||
|
||||
## Database Connection Budget
|
||||
|
||||
|
||||
@@ -35,7 +35,7 @@ Typical targeted commands:
|
||||
- Core API: `docker exec unidesk-backend-core` calls internal `GET /api/overview`, which must report `dbReady: true`, `pgdata.volumeName=unidesk_pgdata_10gb`, a positive PostgreSQL database byte count, and at least one online node; internal `GET /api/performance` must report component request statistics, internal operation statistics, PGDATA usage and Code Queue PostgreSQL storage metadata.
|
||||
- Provider self-connection: internal `GET /api/nodes` must contain `main-server` with `status: online`, `labels.providerGatewayVersion` equal to `src/components/provider-gateway/package.json`, `labels.providerGatewayUpgradePolicy: "always-enabled"`, `labels.providerGatewayRestartPolicyOk: true`, `labels.providerGatewayPidModeOk: true`, and `labels.providerGatewayRuntimeGuardOk: true`; internal `GET /api/nodes/system-status` must contain CPU/memory/disk samples plus a non-empty process resource list sorted by memory by default; internal `GET /api/nodes/docker-status` must contain a Docker snapshot for `main-server`; every running `provider-gateway` container visible in Docker snapshots must report `restartPolicy: "always"` and `pidMode: "host"`; public provider ingress `/health` must return ok.
|
||||
- Provider remote control: internal `/api/dispatch` must successfully complete a real `provider.upgrade` task in `mode: "plan"` so the upgrade path is validated without recreating the running gateway during E2E.
|
||||
- User services: internal `/api/microservices` must include `todo-note` and `oa-event-flow` on `main-server`, canonical `filebrowser` on `D518`, plus `v3sctl-adapter`, `code-queue`, `findjob`, `pipeline`, `met-nonlinear`, `claudeqq` and `filebrowser-d601` on `D601` with `public=false`; `/api/microservices/todo-note/health` must report `storage=postgres`, `/api/microservices/todo-note/proxy/api/instances` must expose the migrated Todo Note lists, and a temporary Todo Note list create/add/toggle/undo/delete cycle must succeed through the real provider-gateway proxy; `/api/microservices/oa-event-flow/health`, `/api/microservices/oa-event-flow/proxy/api/diagnostics`, `/api/microservices/oa-event-flow/proxy/api/events`, `/api/microservices/oa-event-flow/proxy/api/events?tags=service:pipeline` and `/api/microservices/oa-event-flow/proxy/api/stats/trace` must prove the independent OA event table、Pipeline bridge 和 stats center are reachable through UniDesk proxy; `/api/microservices/v3sctl-adapter/health` and `/api/microservices/v3sctl-adapter/proxy/api/control-plane` must expose the D601 v3s/k8s control plane, `kubeApiProxy.mode=kubernetes-api-service-proxy`, D601 active instance `servingHealthy=true`, D518 expected/missing state when D518 has not joined, `status=degraded` for incomplete topology, and `noFallback=true`; `/api/microservices/code-queue/health` must return the active Code Queue backend summary with default model `gpt-5.5`, and `/api/microservices/code-queue/proxy/api/tasks/overview` must return queue state through backend-core -> v3sctl-adapter -> Kubernetes API service proxy -> v3s/k8s Service, not through a `serviceId=code-queue` provider-gateway direct task or `/api/code-queue-direct`; `/api/microservices/filebrowser/health`, `/api/microservices/filebrowser-d601/health` and `/api/microservices/filebrowser/proxy/` must prove File Browser health and WebUI access through UniDesk proxy; `/api/microservices/findjob/health` and `/api/microservices/findjob/proxy/api/summary` must succeed through the real provider-gateway proxy; `/api/microservices/findjob/proxy/api/jobs?__unideskArrayLimit=jobs:5` must return a bounded preview with `_unidesk.arrayLimits` metadata; `/api/microservices/pipeline/health`, `/api/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3` and `/api/microservices/pipeline/proxy/api/oa-event-flow/diagnostics` must return Pipeline health, registry/run previews and OA event-flow evidence; `/api/microservices/met-nonlinear/health`, `/api/microservices/met-nonlinear/proxy/api/queue`, `/api/microservices/met-nonlinear/proxy/api/projects?root=projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects?root=ex_projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects/config?path=<projectPath>` and `/api/microservices/met-nonlinear/proxy/api/images` must return the D601 TS backend health, queue/GPU policy, full project tree inputs, structured project detail and ready `met-nonlinear-ml:tf26` image status.
|
||||
- User services: internal `/api/microservices` must include `todo-note` and `oa-event-flow` on `main-server`, canonical `filebrowser` on `D518`, plus `v3sctl-adapter`, `code-queue`, `findjob`, `pipeline`, `met-nonlinear`, `claudeqq` and `filebrowser-d601` on `D601` with `public=false`; `/api/microservices/todo-note/health` must report `storage=postgres`, `/api/microservices/todo-note/proxy/api/instances` must expose the migrated Todo Note lists, and a temporary Todo Note list create/add/toggle/undo/delete cycle must succeed through the real provider-gateway proxy; `/api/microservices/oa-event-flow/health`, `/api/microservices/oa-event-flow/proxy/api/diagnostics`, `/api/microservices/oa-event-flow/proxy/api/events`, `/api/microservices/oa-event-flow/proxy/api/events?tags=service:pipeline` and `/api/microservices/oa-event-flow/proxy/api/stats/trace` must prove the independent OA event table、Pipeline bridge 和 stats center are reachable through UniDesk proxy; `/api/microservices/v3sctl-adapter/health` and `/api/microservices/v3sctl-adapter/proxy/api/control-plane` must expose the D601 `unidesk-v8s` control plane, `kubeApiProxy.mode=kubernetes-api-service-proxy`, D601 active instance `servingHealthy=true`, D518 standby instance `healthy=true`, `presentNodeIds=[D601,D518]`, `missingNodeIds=[]`, `status=healthy`, and `noFallback=true`; `/api/microservices/code-queue/health` must return the active Code Queue backend summary with default model `gpt-5.5`, `egressProxy.connected=true`, and `/api/microservices/code-queue/proxy/api/tasks/overview` must return queue state through backend-core -> v3sctl-adapter -> Kubernetes API service proxy -> v3s/k8s Service, not through a `serviceId=code-queue` provider-gateway direct task or `/api/code-queue-direct`; `/api/microservices/filebrowser/health`, `/api/microservices/filebrowser-d601/health` and `/api/microservices/filebrowser/proxy/` must prove File Browser health and WebUI access through UniDesk proxy; `/api/microservices/findjob/health` and `/api/microservices/findjob/proxy/api/summary` must succeed through the real provider-gateway proxy; `/api/microservices/findjob/proxy/api/jobs?__unideskArrayLimit=jobs:5` must return a bounded preview with `_unidesk.arrayLimits` metadata; `/api/microservices/pipeline/health`, `/api/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3` and `/api/microservices/pipeline/proxy/api/oa-event-flow/diagnostics` must return Pipeline health, registry/run previews and OA event-flow evidence; `/api/microservices/met-nonlinear/health`, `/api/microservices/met-nonlinear/proxy/api/queue`, `/api/microservices/met-nonlinear/proxy/api/projects?root=projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects?root=ex_projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects/config?path=<projectPath>` and `/api/microservices/met-nonlinear/proxy/api/images` must return the D601 TS backend health, queue/GPU policy, full project tree inputs, structured project detail and ready `met-nonlinear-ml:tf26` image status.
|
||||
- ClaudeQQ availability: `/api/microservices/claudeqq/health` must only pass when `ready=true`, NapCat HTTP and WebSocket are connected, and `napcat.loginState=logged_in`; `/api/microservices/claudeqq/proxy/api/napcat/login` must show the same logged-in account state and `/api/microservices/claudeqq/proxy/api/events/recent` must prove the backend can read the persistent event cache. A QR-code-only or not-logged-in NapCat state must be treated as unhealthy.
|
||||
- Database: the command writes an `unidesk_e2e_markers` row through `docker exec unidesk-database psql`, confirms provider state is stored in PostgreSQL, and checks Todo Note rows exist in `todo_note_instances` using the same named volume.
|
||||
- Pipeline OA event flow: `microservice:pipeline-oa-event-flow` must prove both no-audit and monitor-audit runs are driven by OA events end to end. The event stream must show `node-finished` as a neutral fact with `pipeline:{pipelineId}` and `epoch:{runId}` tags, OA policy as the source of downstream/audit decisions, monitor decisions as OA control events, and runner control-result evidence. E2E must fail if delivery still depends on a legacy detail audit policy flag as policy authority, independent legacy audit-request points, a legacy batch completion gate, direct monitor-to-runner calls, or frontend/CLI writes to Pipeline `.state`.
|
||||
|
||||
@@ -130,11 +130,12 @@ Baidu Netdisk 在 UniDesk 语境中按纯后端服务管理:不得暴露百度
|
||||
- Provider:`D601`,由 D601 provider-gateway 仅维护和访问 `v3sctl-adapter` 的本机私有端口 `127.0.0.1:4266`;provider-gateway 不再作为 `code-queue` 业务请求的直接代理。
|
||||
- 代码引用:`https://github.com/pikasTech/unidesk` 与配置中的 `repository.commitId`;服务源码位于 `src/components/microservices/v3sctl-adapter`,属于 UniDesk 自有控制面组件。
|
||||
- 部署引用:UniDesk 仓库中的 `src/components/microservices/v3sctl-adapter/docker-compose.d601.yml`,Dockerfile 为 `src/components/microservices/v3sctl-adapter/Dockerfile`,容器名为 `v3sctl-adapter`。
|
||||
- v3s 实现:当前 `v3sctl-managed` 可以落到 k3s、k8s 或等价标准 Kubernetes 控制面,但必须使用 Kubernetes 原生命名空间、Deployment、Service、readiness/liveness probe、Kubernetes API service proxy 等规范对象;不得把裸容器端口、NodePort、SSH curl、provider-gateway `microservice.http` 或 host 直连地址伪装成 v3s 服务路由。
|
||||
- v3s 实现:当前运行控制面为 D601 上的 `unidesk-v8s` K3S server 和 D518 上的 K3S agent;`v3sctl-managed` 可以落到 k3s、k8s 或等价标准 Kubernetes 控制面,但必须使用 Kubernetes 原生命名空间、Deployment、Service、readiness/liveness probe、Kubernetes API service proxy 等规范对象;不得把裸容器端口、NodePort、SSH curl、provider-gateway `microservice.http` 或 host 直连地址伪装成 v3s 服务路由。
|
||||
- V8S 系统组件:`unidesk-v8s` server 必须禁用非必要的 `traefik`、`servicelb` 和 `metrics-server`,只保留业务必需的 API server、CoreDNS 与 local-path provisioner;CoreDNS 和 local-path provisioner 固定运行在 D601 控制面节点,避免 D518 维护隧道限制导致系统 DNS/readiness 抖动。
|
||||
- manifest:代管服务声明放在 `src/components/microservices/v3sctl-adapter/v3s/*.v3s.json`,adapter 启动时通过 `V3SCTL_MANIFEST_PATHS` 读取;manifest 是 D601/D518 实例、active instance、single writer、expected nodes 和 health policy 的权威来源。`V3SCTL_SERVICES_JSON` 不得承载 static HTTP 服务、不得覆盖同名服务、不得作为隐藏 fallback;如需追加服务也必须提供完整 `ManagedKubernetesService` manifest。
|
||||
- API:`GET /health` 只表示 adapter 控制面自身可用,并把代管服务 serving 健康作为 `managedServicesHealthy` 字段展示;`GET /api/control-plane` 返回控制面、manifest、kubectl/v3s snapshot 和代管服务状态;`GET /api/services` 返回代管服务列表;`GET|HEAD /api/services/<id>/health` 返回该 v3s 服务的 active serving 健康;`/api/services/<id>/proxy/*` 是业务请求进入 active service 的唯一代理入口。
|
||||
- 代理路径:adapter 访问代管业务服务的唯一正式路径是 Kubernetes API service proxy:`/api/v1/namespaces/<namespace>/services/<service>:<port>/proxy/...`。D601 与 D518 不要求能彼此直连;D518 加入时应优先通过 k3s/k8s 原生 agent/proxy/tunnel 能力让控制面可达该节点,必要时可用 provider 维护通道只承载控制面连接的建立和诊断,但业务请求不得退化为 provider-gateway 直连 Code Queue HTTP 端口。
|
||||
- 拓扑健康:`expectedNodeIds` 负责展示计划内节点,D518 尚未接入时必须保留 `missingNodeIds=["D518"]` 与 `status=degraded` 可见;只要 active D601 Service 通过 Kubernetes API service proxy 返回健康,`servingHealthy=true`、`healthy=true` 和 `managedServicesHealthy=true` 仍应成立。只有显式 `requireAllInstancesHealthy=true` 的服务才允许把缺失 standby/worker 节点提升为整体不健康。
|
||||
- 代理路径:adapter 访问 active 业务服务的唯一正式路径是 Kubernetes API service proxy:`/api/v1/namespaces/<namespace>/services/<service>:<port>/proxy/...`。D601 与 D518 不要求能彼此直连;D518 通过 K3S agent 加入控制面,控制面连接可以借助节点维护隧道建立,但业务请求不得退化为 provider-gateway 直连 Code Queue HTTP 端口。standby/worker 节点如果受 kubelet/service-proxy 可达性限制,可以在 manifest 中显式使用 `healthMode=pod-ready` 作为拓扑健康探针;这只读取 Kubernetes Pod readiness,不是业务代理路径,也不能替代 active Service proxy。
|
||||
- 拓扑健康:`expectedNodeIds` 负责展示计划内节点;当前 Code Queue 目标拓扑必须同时包含 D601 和 D518,`presentNodeIds` 应为 `["D601","D518"]`、`missingNodeIds=[]`、`topologyComplete=true`、`status=healthy`。D518 未加入只允许作为迁移中的显式 degraded 状态,不能隐藏为 fallback;只有显式 `requireAllInstancesHealthy=true` 的服务才允许把缺失 standby/worker 节点提升为整体不健康。
|
||||
- 前端:`用户服务 / V3S Control` React 页面必须只通过 `/api/microservices/v3sctl-adapter/proxy/api/control-plane` 通信,展示控制面状态、manifest、D601/D518 实例、active instance、Kubernetes API service proxy/no-fallback 路径和显式原始 JSON 按钮;页面不得直接访问 provider-gateway、D601/D518 业务容器端口、NodePort 或 raw v3s/kubectl API。
|
||||
|
||||
### Code Queue V3S-Managed
|
||||
@@ -148,7 +149,7 @@ Baidu Netdisk 在 UniDesk 语境中按纯后端服务管理:不得暴露百度
|
||||
- 主服务依赖映射:Code Queue 仍以主 PostgreSQL 为权威数据库,`DATABASE_URL` 必须指向主 server 受限端口映射;`OA_EVENT_FLOW_BASE_URL` 必须指向主 server OA Event Flow 受限端口映射;D601 active 实例的 `CODE_QUEUE_NOTIFY_CLAUDEQQ_BASE_URL` 直接使用本机 ClaudeQQ 映射 `http://host.docker.internal:3290`。这些端口映射只服务受控节点运行时,必须用防火墙或等价策略限制来源,不得成为浏览器或任意公网客户端入口。
|
||||
- K8s 探针与启动维护:Kubernetes liveness/startup probe 必须使用轻量 `/live`,readiness 和用户服务健康使用 `/health`;`/health` 不得执行全量任务聚合、历史回填或长事务索引维护,历史任务总览应由 `/api/tasks/overview` 读取 PostgreSQL。启动时允许后台执行队列元数据 flush、通知 outbox 读取、任务表索引维护和 overview warmup,但这些维护不得阻塞 Bun server、readiness endpoint 或 frontend overview;通知表索引和大批量 OA backfill 不得作为默认启动副作用。
|
||||
- MiniMax/OpenCode 并发:`minimax-m2.7` 通过 OpenCode JSON 事件端口运行;每个 Code Queue task 必须使用独立的 OpenCode XDG data/config/cache/state 目录,禁止多队列并发任务共享同一个 OpenCode SQLite/WAL 状态目录,否则并发 smoke 会触发 `PRAGMA journal_mode = WAL` 之类的数据库锁或初始化错误。用于验证 v3s/k8s 链路的 MiniMax smoke 以“至少 4 个任务、分布到 2 个 queue、至少 2 个终态成功”为链路验收线;剩余失败如果是 OpenCode 最终回复捕获、业务任务判定或模型限流,应作为 Code Queue 执行可靠性问题单独排查,不能反推 v3s 代理链路失败。
|
||||
- 默认出网代理:D601 active Code Queue Pod 必须默认把 `HTTP_PROXY`、`HTTPS_PROXY` 和 `ALL_PROXY` 注入给 Codex/OpenCode、`git`、`curl`、`npm` 等任务子进程;当前唯一上游是 D601 provider-gateway 通过宿主 loopback 发布的 egress HTTP CONNECT 端口 `http://host.docker.internal:18789`,该端口只允许绑定 `127.0.0.1`,不得开放公网。这里的 provider-gateway 只承担出网代理,不承担 Code Queue 业务 HTTP 代理;业务访问仍只能走 Kubernetes API service proxy。k3s/k8s 原生 egress gateway、service mesh 或 CNI egress policy 只作为后续网络层增强方向,当前交付态不引入第二套出网控制面。远程开发/执行容器不得只依赖这些环境变量,必须在容器网络层用 TUN 默认路由和 OUTPUT 防火墙强制外网流量只能经 master TUN 出口。
|
||||
- 默认出网代理:D601 active Code Queue Pod 必须默认把 `HTTP_PROXY`、`HTTPS_PROXY` 和 `ALL_PROXY` 注入给 Codex/OpenCode、`git`、`curl`、`npm` 等任务子进程;当前唯一上游是 D601 provider-gateway egress HTTP CONNECT 代理,并通过 Kubernetes `Service d601-provider-egress-proxy` 暴露给 `unidesk` namespace 内的 Pod。该 Service 的 EndpointSlice 指向 D601 provider-gateway 私有 Docker network endpoint,Pod 内代理 URL 使用 `http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789`,provider-gateway 宿主端口仍只允许绑定 `127.0.0.1`,不得开放公网;如 provider-gateway 容器 IP 变化,必须同步刷新 EndpointSlice 并用 Code Queue `/health.egressProxy.connected=true` 验证。这里的 provider-gateway 只承担出网代理,不承担 Code Queue 业务 HTTP 代理;业务访问仍只能走 Kubernetes API service proxy。k3s/k8s 原生 egress gateway、service mesh 或 CNI egress policy 只作为后续网络层增强方向,当前交付态不引入第二套出网控制面。远程开发/执行容器不得只依赖这些环境变量,必须在容器网络层用 TUN 默认路由和 OUTPUT 防火墙强制外网流量只能经 master TUN 出口。
|
||||
- 出网代理无 fallback 纪律:Code Queue 的运行时配置只允许一个默认出网路径,即 provider-gateway egress proxy;不得在代码中同时保留 Code Queue 自建 WebSocket proxy、临时 shell proxy、D601 本地直连公网、主 server direct HTTP proxy 等隐式分支。任何新增网络 fallback 都必须先进入本参考文档并配套 `/health` 可见状态,否则视为残留旧路径。
|
||||
- 上线纪律:Code Queue 相关的前端或后端改进必须在同一任务内正式上线并验证公网 frontend 或 live API,不能只停留在源码、构建产物或“后续再上线”。修改 Code Queue 自身时不得等待当前 Code Queue task 结束、等待 queue idle 或等待 `0 running` 后才重启;应通过 v3s 控制面或 D601/D518 维护入口做 build-first 替换,并用 v3s adapter、Code Queue live API 或公网 frontend 证明任务和队列仍可读可继续。
|
||||
- 更名与灾备恢复:旧版 Codex 队列服务名只允许作为兼容诊断和一次性迁移来源;`code-queue-backend` 容器自身 `/health` 正常但 `microservice health code-queue` 返回 provider 直连错误时,优先判定为 backend-core 仍加载旧 `MICROSERVICES_JSON` 或 adapter manifest 未刷新,必须刷新 `.state/docker-compose.env`、重建/替换 `backend-core` 与 `v3sctl-adapter`,随后用 `microservice list` 验证 `code-queue` 的 `runtime.orchestrator=v3sctl`、`backend.proxyMode=v3sctl-adapter-http` 和无业务容器直连摘要。
|
||||
@@ -281,7 +282,7 @@ ClaudeQQ 在 UniDesk 语境中按消息网关后端服务管理:不得直接
|
||||
- `bun scripts/cli.ts microservice health claudeqq`、`bun scripts/cli.ts microservice proxy claudeqq /api/napcat/login`、`bun scripts/cli.ts microservice proxy claudeqq /api/events/recent` 和 `bun scripts/cli.ts microservice proxy claudeqq /api/events/subscriptions`:验证 ClaudeQQ 后端、NapCat 容器登录、事件订阅和私有代理链路;消息推送使用 `POST /api/push/text`,不得开放 D601 `3290/3000/3001/6099` 公网端口。
|
||||
- `bun scripts/cli.ts microservice health todo-note` 与 `bun scripts/cli.ts microservice proxy todo-note /api/instances`:验证主 server Todo Note 后端、PostgreSQL 存储和本机 provider-gateway 私有代理链路。
|
||||
- `bun scripts/cli.ts microservice health oa-event-flow`、`bun scripts/cli.ts microservice proxy oa-event-flow /api/diagnostics --raw` 与 `bun scripts/cli.ts microservice proxy oa-event-flow '/api/events?tags=service:code-queue&limit=20' --raw`:验证统一 OA 事件流、事件表、tag 查询和统计中心。
|
||||
- `bun scripts/cli.ts microservice health v3sctl-adapter` 与 `bun scripts/cli.ts microservice proxy v3sctl-adapter /api/control-plane --raw`:验证 D601 v3s 控制面 adapter、manifest、D601/D518 实例状态和 no-fallback 运行路径。
|
||||
- `bun scripts/cli.ts microservice health v3sctl-adapter` 与 `bun scripts/cli.ts microservice proxy v3sctl-adapter /api/control-plane --raw`:验证 D601 `unidesk-v8s` 控制面 adapter、manifest、D601 active/D518 standby 实例状态、`presentNodeIds=[D601,D518]`、`missingNodeIds=[]` 和 no-fallback 运行路径。
|
||||
- `bun scripts/cli.ts microservice health code-queue` 与 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`:验证 Code Queue 经过 backend-core -> v3sctl-adapter -> v3s active service 的单一路径;输出不得出现 `serviceId=code-queue` 的 provider-gateway `microservice.http` 业务代理任务,写入、追加 prompt、打断和 readAt/未读状态都必须由 backend 写入 PostgreSQL,frontend 不得用本地存储伪造成功状态。
|
||||
- `bun scripts/cli.ts microservice health filebrowser`、`bun scripts/cli.ts microservice health filebrowser-d601` 与 `bun scripts/cli.ts microservice proxy filebrowser / --max-body-bytes 2000`:验证 D518 主 File Browser 和 D601 备用 File Browser 私有代理链路;浏览器 WebUI 必须通过 `/api/microservices/filebrowser/proxy/` 或 `/api/microservices/filebrowser-d601/proxy/` 访问,不得直接开放 `4251` 公网端口。
|
||||
- `bun scripts/cli.ts --main-server-ip 74.48.78.17 microservice health findjob`:在计算节点或其他非主 server 主机上通过公网 frontend remote CLI 进行同一验证,不需要主 server SSH key。
|
||||
@@ -309,7 +310,7 @@ ClaudeQQ 在 UniDesk 语境中按消息网关后端服务管理:不得直接
|
||||
- 运行 `bun scripts/cli.ts microservice health met-nonlinear`、`bun scripts/cli.ts microservice proxy met-nonlinear /api/queue`、`bun scripts/cli.ts microservice proxy met-nonlinear '/api/projects?root=projects&limit=20'` 和 `bun scripts/cli.ts microservice proxy met-nonlinear /api/images`,确认真实链路经过 backend-core、WebSocket、D601 provider-gateway 和 D601 本机 MET Nonlinear TS 后端。
|
||||
- 运行 `bun scripts/cli.ts microservice health claudeqq`、`bun scripts/cli.ts microservice proxy claudeqq /api/napcat/login`、`bun scripts/cli.ts microservice proxy claudeqq /api/events/recent` 和 `bun scripts/cli.ts microservice proxy claudeqq /api/events/subscriptions`,确认真实链路经过 backend-core、WebSocket、D601 provider-gateway 和 D601 本机 ClaudeQQ 后端;在 D601 上 `curl http://127.0.0.1:3290/health` 应显示 `service=claudeqq`、`pureBackend=true`、`napcat.containerized=true`、NapCat HTTP/WS 状态、二维码状态和订阅计数。
|
||||
- 运行 `bun scripts/cli.ts microservice health todo-note` 与 `bun scripts/cli.ts microservice proxy todo-note /api/instances`,确认真实链路经过 backend-core、WebSocket、main-server provider-gateway 和主 server `todo-note-backend` 后端;输出中必须包含五个迁移清单和 PostgreSQL 存储健康状态。
|
||||
- 运行 `bun scripts/cli.ts microservice health v3sctl-adapter`、`bun scripts/cli.ts microservice proxy v3sctl-adapter /api/control-plane --raw`、`bun scripts/cli.ts microservice health code-queue` 与 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`,确认真实链路经过 backend-core -> v3sctl-adapter -> v3s active service;Code Queue `/health` 必须仍返回业务后端自己的 `queue.storage.primary=postgres`、`queue.storage.postgresReady=true` 和 `queue.notifications.claudeqq.outbox.storage=postgres`,不得被 adapter 聚合健康 JSON 替代。还必须在 active Code Queue Pod 内验证主 PostgreSQL 端口映射、主 OA Event Flow 端口映射和本机 ClaudeQQ `http://host.docker.internal:3290` 均可访问,并在 adapter 控制页确认 D601 active serving healthy、D518 expected/missing 可见且整体不退化为 hidden fallback。再通过公网 frontend 提交一个 `gpt-5.5` 小任务,确认队列串行推进、输出实时更新、结束后有 judge 判定,且运行中可追加 prompt 或打断。Code Queue 的重启恢复必须作为验收项:运行中任务存在时重启或重建 active 实例后,任务必须从 PostgreSQL 恢复到可继续执行状态,不能丢失 active task、`promptHistory`、后续 queued 任务、readAt/未读状态或已入 outbox 的 ClaudeQQ 通知。Code Queue 服务名、表名前缀或持久化目录发生迁移后,还必须运行 `bun scripts/cli.ts e2e run --only microservice:catalog-code-queue,microservice:code-queue-status,microservice:code-queue-health,microservice:code-queue-tasks`,证明 backend-core catalog、v3s adapter 私有代理、PostgreSQL 队列和任务列表都指向 `code-queue`。批量验收必须通过公网 frontend 设置 `入队份数=5` 或使用多段 prompt 分隔,一次性入队 5 条任务,并确认 5 条任务按顺序进入 running/judging/succeeded,而不是只运行第一条。
|
||||
- 运行 `bun scripts/cli.ts microservice health v3sctl-adapter`、`bun scripts/cli.ts microservice proxy v3sctl-adapter /api/control-plane --raw`、`bun scripts/cli.ts microservice health code-queue` 与 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`,确认真实链路经过 backend-core -> v3sctl-adapter -> v3s active service;Code Queue `/health` 必须仍返回业务后端自己的 `queue.storage.primary=postgres`、`queue.storage.postgresReady=true`、`queue.notifications.claudeqq.outbox.storage=postgres` 和 `egressProxy.connected=true`,不得被 adapter 聚合健康 JSON 替代。还必须在 active Code Queue Pod 内验证主 PostgreSQL 端口映射、主 OA Event Flow 端口映射、本机 ClaudeQQ `http://host.docker.internal:3290` 和 `d601-provider-egress-proxy` 均可访问,并在 adapter 控制页确认 D601 active serving healthy、D518 standby pod ready、`missingNodeIds=[]` 且整体不退化为 hidden fallback。再通过公网 frontend 提交一个 `gpt-5.5` 小任务,确认队列串行推进、输出实时更新、结束后有 judge 判定,且运行中可追加 prompt 或打断。Code Queue 的重启恢复必须作为验收项:运行中任务存在时重启或重建 active 实例后,任务必须从 PostgreSQL 恢复到可继续执行状态,不能丢失 active task、`promptHistory`、后续 queued 任务、readAt/未读状态或已入 outbox 的 ClaudeQQ 通知。Code Queue 服务名、表名前缀或持久化目录发生迁移后,还必须运行 `bun scripts/cli.ts e2e run --only microservice:catalog-code-queue,microservice:code-queue-status,microservice:code-queue-health,microservice:code-queue-tasks`,证明 backend-core catalog、v3s adapter 私有代理、PostgreSQL 队列和任务列表都指向 `code-queue`。批量验收必须通过公网 frontend 设置 `入队份数=5` 或使用多段 prompt 分隔,一次性入队 5 条任务,并确认 5 条任务按顺序进入 running/judging/succeeded,而不是只运行第一条。
|
||||
- Code Queue 内存防回归验收:凡是改动 Code Queue 的持久化、scheduler、输出/Trace、health、列表/详情查询、日志导出或容器运行参数,交付前必须在 D601 用 `kubectl -n unidesk get deploy,pod,svc,endpoints -o wide`、`kubectl -n unidesk describe deploy/code-queue` 或等价 Docker inspect 确认 memory/swap 硬上限符合预算,运行 `kubectl -n unidesk top pod` 或 Docker stats 确认常驻内存、`OOMKilled=false` 和 `RestartCount` 未异常增长,再运行 `bun scripts/cli.ts microservice health code-queue` 确认 `/health` 是轻量 readiness 且暴露 PostgreSQL/notification/outbox 状态。验收还必须覆盖有历史任务存在时的 `/api/tasks/overview`、单任务详情和 output/transcript 查询,证明热状态裁剪不会丢历史输出、也不会重新把全部历史 `task_json` 缓存在进程内;涉及 TypeScript/frontend 验证的任务应能在 D601 Code Queue memory/swap 预算中完成 `bun run --cwd src/components/frontend check` 这类短时高内存命令,而不是被 memory watchdog 反复 SIGTERM。
|
||||
- Code Queue 延迟防回归验收:凡是改动 Code Queue 列表、overview、readAt、Trace/summary 懒加载、实时 output/SSE 事件发布、frontend 请求策略、backend-core 用户服务代理或 frontend Code Queue 请求路径,交付前必须在有历史任务数据且有 active output 流动的 live 环境验证 `GET /api/tasks/overview`、`POST /api/tasks/<id>/read`、选定 task 的 `trace-step` 和前端 `/app/code-queue/` 首屏均低于 1s 目标;可运行 `bun scripts/src/code-queue-perf.ts --json --target-ms 1000` 采集公网 frontend 下的首屏耗时、最慢 API 和 DOM 完成指标,并用 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview --raw`、D601 Pod `/health` 与 `/api/tasks/overview` curl、性能面板 `/api/performance` 与 `/api/frontend-performance` 失败/慢操作记录、`kubectl -n unidesk top pod` 或 Docker stats 补充后端耗时、代理 502 和内存/CPU 证据。验收结论必须同时说明是否使用了短 TTL cache、cache 如何被 mutation 或 archive append 失效、数据库索引/聚合是否命中、输出热路径是否只读增量指标,以及分页加载是否跳过 selected/active/stats;不能只展示 cache 命中后的单次快照。
|
||||
- 运行 `bun scripts/cli.ts microservice health filebrowser`、`bun scripts/cli.ts microservice health filebrowser-d601` 和 `bun scripts/cli.ts microservice proxy filebrowser / --max-body-bytes 2000`,确认 File Browser health 返回 `status=OK`,WebUI HTML 包含 `File Browser`,D518/D601 通过 provider-gateway 访问节点本机 `4251`;随后在公网 frontend 的 `用户服务 / File Browser` 中确认 D518 为默认目标、可导出截图、iframe 紧凑布局不再有巨大 `folder` 标记遮挡文件名,并可浏览 `/mnt/c`。
|
||||
|
||||
@@ -92,11 +92,11 @@ provider ingress 是唯一允许公网暴露的 provider 连接接口,当前
|
||||
|
||||
## Egress Proxy
|
||||
|
||||
provider-gateway 可以提供 egress HTTP CONNECT 代理,用于让 Code Queue、Pipeline runner 等节点侧执行环境通过既有 provider WebSocket 通道出网。代理默认监听容器内 `0.0.0.0:18789`,节点部署必须只发布为宿主 loopback `127.0.0.1:18789->18789/tcp`,不得开放公网端口;普通 Docker 执行容器可通过同一私有 Docker network 访问 provider-gateway 容器名,v3s/k8s Pod 统一通过 `host.docker.internal:18789` 访问该 loopback 映射。代理只负责把本地 CONNECT/absolute HTTP 请求转换为 `egress_tcp_open`、`egress_tcp_data`、`egress_tcp_close` 消息;backend-core 在主 server 侧建立真实 TCP 连接并把数据回传,避免 D601 等计算节点本地网络不可达时卡死 Codex/Git/NPM。
|
||||
provider-gateway 可以提供 egress HTTP CONNECT 代理,用于让 Code Queue、Pipeline runner 等节点侧执行环境通过既有 provider WebSocket 通道出网。代理默认监听容器内 `0.0.0.0:18789`,节点部署必须只发布为宿主 loopback `127.0.0.1:18789->18789/tcp`,不得开放公网端口;普通 Docker 执行容器可通过同一私有 Docker network 访问 provider-gateway 容器名,v3s/k8s Pod 必须通过显式 Kubernetes Service/EndpointSlice 暴露同节点 provider-gateway 私有 endpoint,例如 D601 Code Queue 使用 `d601-provider-egress-proxy.unidesk.svc.cluster.local:18789`,不得把该 egress Service 当作业务 HTTP 入口。代理只负责把本地 CONNECT/absolute HTTP 请求转换为 `egress_tcp_open`、`egress_tcp_data`、`egress_tcp_close` 消息;backend-core 在主 server 侧建立真实 TCP 连接并把数据回传,避免 D601 等计算节点本地网络不可达时卡死 Codex/Git/NPM。
|
||||
|
||||
该能力属于 provider-gateway 通道能力,register/heartbeat 的 `unideskCapabilities` 必须包含 `network.egress-proxy`,labels 必须上报 `providerGatewayEgressProxy*` 状态。不得再为某个用户服务单独注册伪 provider 来实现出网代理;否则节点列表会出现虚假 provider,且代理、统计、升级路径会形成多套通道。代理健康检查使用 `GET /__unidesk/egress-proxy/health`,返回 `connected`、`providerId`、`activeTunnels` 和监听端口;业务服务自己的 `/health` 应把该结果作为排障证据透出。
|
||||
|
||||
egress proxy 的长期边界是“统一 provider 通道,不引入第二控制面”。backend-core 只接受在线 provider socket 上的 `egress_tcp_*` 消息,并在该 socket 关闭时销毁全部对应 TCP relay;provider-gateway 只维护本地 HTTP proxy 与 WebSocket 消息映射,不保存业务状态,不参与任务调度、统计或节点注册以外的控制面。执行容器、用户服务和 Pipeline runner 不允许直接连接 backend-core provider ingress,也不允许携带 provider token 自行注册;需要出网时只能连接同节点 provider-gateway 的私有 proxy endpoint。当前 v3s/k8s Code Queue 采用 `host.docker.internal:18789`,这是节点 loopback egress 入口,不是业务 HTTP 代理入口,也不能替代 Kubernetes API service proxy。
|
||||
egress proxy 的长期边界是“统一 provider 通道,不引入第二控制面”。backend-core 只接受在线 provider socket 上的 `egress_tcp_*` 消息,并在该 socket 关闭时销毁全部对应 TCP relay;provider-gateway 只维护本地 HTTP proxy 与 WebSocket 消息映射,不保存业务状态,不参与任务调度、统计或节点注册以外的控制面。执行容器、用户服务和 Pipeline runner 不允许直接连接 backend-core provider ingress,也不允许携带 provider token 自行注册;需要出网时只能连接同节点 provider-gateway 的私有 proxy endpoint。当前 v3s/k8s Code Queue 通过 `d601-provider-egress-proxy` Kubernetes Service 连接 D601 provider-gateway egress endpoint,这是 Pod 内的出网入口,不是业务 HTTP 代理入口,也不能替代 Kubernetes API service proxy。
|
||||
|
||||
故障语义必须显式,不允许静默 fallback。provider-gateway 到 backend-core 的 WebSocket 未连接时,本地 proxy 必须返回 503;执行容器不能自动绕过到 D601 本地直连公网、外部公共代理或主 server 公网 HTTP 端口。`NO_PROXY` 只用于 PostgreSQL、OA Event Flow、ClaudeQQ、frontend/backend-core 内网代理、provider-gateway health 等明确内网链路,不能把 GitHub、模型 API、npm registry 等外部目标加入绕过列表。验收必须同时证明 provider-gateway labels、业务服务 `/health` 和执行容器内 `curl -I https://...` 都走同一 proxy path。
|
||||
|
||||
|
||||
+54
-2
@@ -61,6 +61,8 @@ const SERVICE_CHECK_NAMES = [
|
||||
"microservice:catalog-todo-note",
|
||||
"microservice:catalog-oa-event-flow",
|
||||
"microservice:catalog-code-queue",
|
||||
"microservice:v3sctl-adapter-status",
|
||||
"microservice:v3sctl-control-plane",
|
||||
"microservice:catalog-filebrowser",
|
||||
"microservice:filebrowser-health",
|
||||
"microservice:filebrowser-webui",
|
||||
@@ -1026,6 +1028,8 @@ async function serviceChecks(config: UniDeskConfig, urls: PublicUrls, checks: E2
|
||||
const oaEventFlowEvents = dockerCoreJson("/api/microservices/oa-event-flow/proxy/api/events?limit=10");
|
||||
const oaEventFlowPipelineEvents = dockerCoreJson("/api/microservices/oa-event-flow/proxy/api/events?tags=service:pipeline&limit=10");
|
||||
const oaEventFlowStats = dockerCoreJson("/api/microservices/oa-event-flow/proxy/api/stats/trace?limit=10");
|
||||
const v3sctlStatus = dockerCoreJson("/api/microservices/v3sctl-adapter/status");
|
||||
const v3sctlControlPlane = dockerCoreJson("/api/microservices/v3sctl-adapter/proxy/api/control-plane");
|
||||
const codeQueueStatus = dockerCoreJson("/api/microservices/code-queue/status");
|
||||
const codeQueueHealth = dockerCoreJson("/api/microservices/code-queue/health");
|
||||
const codeQueueTasks = dockerCoreJson("/api/microservices/code-queue/proxy/api/tasks/overview?limit=5&transcriptLimit=1&compact=1&afterSeq=0&preferId=");
|
||||
@@ -1100,8 +1104,27 @@ async function serviceChecks(config: UniDeskConfig, urls: PublicUrls, checks: E2
|
||||
const oaEventFlowEventsBody = (oaEventFlowEvents as { body?: { ok?: boolean; events?: unknown[]; returned?: number } }).body;
|
||||
const oaEventFlowPipelineEventsBody = (oaEventFlowPipelineEvents as { body?: { ok?: boolean; events?: Array<{ tags?: unknown[]; sourceId?: string; type?: string; payload?: { runId?: string; pipelineId?: string } }>; returned?: number } }).body;
|
||||
const oaEventFlowStatsBody = (oaEventFlowStats as { body?: { ok?: boolean; stats?: unknown[]; returned?: number } }).body;
|
||||
const codeQueueHealthBody = (codeQueueHealth as { body?: { ok?: boolean; queue?: { defaultModel?: string; judgeConfigured?: boolean; modelReasoningEfforts?: Record<string, string> } } }).body;
|
||||
const codeQueueHealthBody = (codeQueueHealth as { body?: { ok?: boolean; egressProxy?: { connected?: boolean }; queue?: { defaultModel?: string; judgeConfigured?: boolean; modelReasoningEfforts?: Record<string, string> } } }).body;
|
||||
const codeQueueTasksBody = (codeQueueTasks as { body?: { ok?: boolean; queue?: { defaultModel?: string; modelReasoningEfforts?: Record<string, string> }; tasks?: unknown[] } }).body;
|
||||
const v3sctlControlPlaneBody = (v3sctlControlPlane as { body?: {
|
||||
ok?: boolean;
|
||||
clusterId?: string;
|
||||
noFallback?: boolean;
|
||||
managedServicesHealthy?: boolean;
|
||||
kubeApiProxy?: { mode?: string };
|
||||
services?: Array<{
|
||||
id?: string;
|
||||
status?: string;
|
||||
presentNodeIds?: string[];
|
||||
missingNodeIds?: string[];
|
||||
topologyComplete?: boolean;
|
||||
servingHealthy?: boolean;
|
||||
active?: { id?: string; healthy?: boolean };
|
||||
instances?: Array<{ id?: string; healthy?: boolean; proxyMode?: string }>;
|
||||
}>;
|
||||
} }).body;
|
||||
const v3sctlCodeQueueService = v3sctlControlPlaneBody?.services?.find((service) => service.id === "code-queue");
|
||||
const v3sctlD518Instance = v3sctlCodeQueueService?.instances?.find((instance) => instance.id === "D518");
|
||||
const filebrowserHealthBody = (filebrowserHealth as { body?: { status?: string } }).body;
|
||||
const filebrowserD601HealthBody = (filebrowserD601Health as { body?: { status?: string } }).body;
|
||||
const filebrowserWebuiText = String((filebrowserWebui as { body?: { text?: string } }).body?.text || "");
|
||||
@@ -1141,6 +1164,35 @@ async function serviceChecks(config: UniDeskConfig, urls: PublicUrls, checks: E2
|
||||
&& codeQueue.runtime?.orchestrator === "v3sctl"
|
||||
&& codeQueue.runtime?.container === null,
|
||||
{ microservices });
|
||||
addSelectedCheck(checks, options, "microservice:v3sctl-adapter-status",
|
||||
(v3sctlStatus as { ok?: boolean; body?: { microservice?: { id?: string; providerId?: string } } }).ok === true
|
||||
&& (v3sctlStatus as { body?: { microservice?: { id?: string; providerId?: string } } }).body?.microservice?.id === "v3sctl-adapter"
|
||||
&& (v3sctlStatus as { body?: { microservice?: { id?: string; providerId?: string } } }).body?.microservice?.providerId === "D601",
|
||||
v3sctlStatus);
|
||||
addSelectedCheck(checks, options, "microservice:v3sctl-control-plane",
|
||||
(v3sctlControlPlane as { ok?: boolean }).ok === true
|
||||
&& v3sctlControlPlaneBody?.ok === true
|
||||
&& v3sctlControlPlaneBody.clusterId === "unidesk-v8s"
|
||||
&& v3sctlControlPlaneBody.noFallback === true
|
||||
&& v3sctlControlPlaneBody.managedServicesHealthy === true
|
||||
&& v3sctlControlPlaneBody.kubeApiProxy?.mode === "kubernetes-api-service-proxy"
|
||||
&& v3sctlCodeQueueService?.status === "healthy"
|
||||
&& v3sctlCodeQueueService?.topologyComplete === true
|
||||
&& v3sctlCodeQueueService?.servingHealthy === true
|
||||
&& v3sctlCodeQueueService?.active?.id === "D601"
|
||||
&& v3sctlCodeQueueService?.active?.healthy === true
|
||||
&& (v3sctlCodeQueueService?.presentNodeIds ?? []).includes("D601")
|
||||
&& (v3sctlCodeQueueService?.presentNodeIds ?? []).includes("D518")
|
||||
&& (v3sctlCodeQueueService?.missingNodeIds ?? []).length === 0
|
||||
&& v3sctlD518Instance?.healthy === true
|
||||
&& v3sctlD518Instance?.proxyMode === "kubernetes-api-pod-readiness",
|
||||
{
|
||||
ok: (v3sctlControlPlane as { ok?: boolean }).ok,
|
||||
clusterId: v3sctlControlPlaneBody?.clusterId,
|
||||
noFallback: v3sctlControlPlaneBody?.noFallback,
|
||||
kubeApiProxy: v3sctlControlPlaneBody?.kubeApiProxy,
|
||||
service: v3sctlCodeQueueService,
|
||||
});
|
||||
addSelectedCheck(checks, options, "microservice:catalog-filebrowser", (microservices as { ok?: boolean }).ok === true
|
||||
&& filebrowser?.providerId === "D518"
|
||||
&& filebrowser.backend?.public === false
|
||||
@@ -1209,7 +1261,7 @@ async function serviceChecks(config: UniDeskConfig, urls: PublicUrls, checks: E2
|
||||
});
|
||||
addSelectedCheck(checks, options, "microservice:oa-event-flow-stats", (oaEventFlowStats as { ok?: boolean }).ok === true && oaEventFlowStatsBody?.ok === true && Array.isArray(oaEventFlowStatsBody.stats), oaEventFlowStats);
|
||||
addSelectedCheck(checks, options, "microservice:code-queue-status", (codeQueueStatus as { ok?: boolean }).ok === true && (codeQueueStatus as { body?: { microservice?: { id?: string; providerId?: string } } }).body?.microservice?.providerId === "D601", codeQueueStatus);
|
||||
addSelectedCheck(checks, options, "microservice:code-queue-health", (codeQueueHealth as { ok?: boolean }).ok === true && codeQueueHealthBody?.ok === true && codeQueueHealthBody.queue?.defaultModel === "gpt-5.5" && codeQueueHealthBody.queue?.modelReasoningEfforts?.["gpt-5.5"] === "xhigh", codeQueueHealth);
|
||||
addSelectedCheck(checks, options, "microservice:code-queue-health", (codeQueueHealth as { ok?: boolean }).ok === true && codeQueueHealthBody?.ok === true && codeQueueHealthBody.egressProxy?.connected === true && codeQueueHealthBody.queue?.defaultModel === "gpt-5.5" && codeQueueHealthBody.queue?.modelReasoningEfforts?.["gpt-5.5"] === "xhigh", codeQueueHealth);
|
||||
addSelectedCheck(checks, options, "microservice:code-queue-tasks", (codeQueueTasks as { ok?: boolean }).ok === true && codeQueueTasksBody?.ok === true && Array.isArray(codeQueueTasksBody.tasks) && codeQueueTasksBody.queue?.defaultModel === "gpt-5.5" && codeQueueTasksBody.queue?.modelReasoningEfforts?.["gpt-5.5"] === "xhigh", codeQueueTasks);
|
||||
const upgradeDispatch = dockerCoreJson("/api/dispatch", {
|
||||
method: "POST",
|
||||
|
||||
@@ -17,18 +17,18 @@ services:
|
||||
HOST: "0.0.0.0"
|
||||
PORT: "4266"
|
||||
LOG_FILE: "/var/log/unidesk/v3sctl-adapter.jsonl"
|
||||
V3SCTL_CLUSTER_ID: "${V3SCTL_CLUSTER_ID:-D601}"
|
||||
V3SCTL_CLUSTER_ID: "${V3SCTL_CLUSTER_ID:-unidesk-v8s}"
|
||||
V3SCTL_NODE_ID: "${V3SCTL_NODE_ID:-D601}"
|
||||
V3SCTL_KUBECTL_ENABLED: "${V3SCTL_KUBECTL_ENABLED:-false}"
|
||||
V3SCTL_KUBE_API_PROXY_ENABLED: "${V3SCTL_KUBE_API_PROXY_ENABLED:-true}"
|
||||
V3SCTL_KUBECONFIG_PATH: "/var/lib/unidesk/v3s/kubeconfig"
|
||||
V3SCTL_KUBECONFIG_PATH: "/var/lib/unidesk/v8s/kubeconfig"
|
||||
V3SCTL_KUBE_API_CONNECT_HOST: "${V3SCTL_KUBE_API_CONNECT_HOST:-host.docker.internal}"
|
||||
V3SCTL_MANIFEST_PATHS: "${V3SCTL_MANIFEST_PATHS:-v3s/code-queue.v3s.json}"
|
||||
V3SCTL_SERVICES_JSON: "${V3SCTL_SERVICES_JSON:-[]}"
|
||||
UNIDESK_LOG_RETENTION_BYTES: "${UNIDESK_LOG_RETENTION_BYTES:-512MiB}"
|
||||
volumes:
|
||||
- ${V3SCTL_ADAPTER_LOG_DIR:-../../../../.state/v3sctl-adapter/logs}:/var/log/unidesk
|
||||
- ${V3SCTL_KUBECONFIG_HOST_PATH:-../../../../.state/v3s/kubeconfig}:/var/lib/unidesk/v3s/kubeconfig:ro
|
||||
- ${V3SCTL_KUBECONFIG_HOST_PATH:-../../../../.state/v8s/kubeconfig}:/var/lib/unidesk/v8s/kubeconfig:ro
|
||||
extra_hosts:
|
||||
- "host.docker.internal:host-gateway"
|
||||
networks:
|
||||
|
||||
@@ -8,6 +8,7 @@ type JsonValue = string | number | boolean | null | JsonValue[] | { [key: string
|
||||
type JsonRecord = Record<string, JsonValue>;
|
||||
|
||||
type InstanceRole = "primary" | "standby" | "worker";
|
||||
type EndpointHealthMode = "service-proxy" | "pod-ready";
|
||||
|
||||
interface ManagedEndpoint {
|
||||
id: string;
|
||||
@@ -15,6 +16,7 @@ interface ManagedEndpoint {
|
||||
role: InstanceRole;
|
||||
baseUrl: string;
|
||||
healthPath: string;
|
||||
healthMode: EndpointHealthMode;
|
||||
}
|
||||
|
||||
interface ManagedService {
|
||||
@@ -143,6 +145,11 @@ function normalizeRole(value: string): InstanceRole {
|
||||
return "worker";
|
||||
}
|
||||
|
||||
function normalizeHealthMode(value: string): EndpointHealthMode {
|
||||
if (value === "service-proxy" || value === "pod-ready") return value;
|
||||
return "service-proxy";
|
||||
}
|
||||
|
||||
function parseEndpoint(value: unknown, index: number, ownerPath = "endpoint"): ManagedEndpoint {
|
||||
const path = `${ownerPath}[${index}]`;
|
||||
const item = asRecord(value, path);
|
||||
@@ -154,6 +161,7 @@ function parseEndpoint(value: unknown, index: number, ownerPath = "endpoint"): M
|
||||
role: normalizeRole(optionalStringField(item, "role", id === "D601" ? "primary" : "standby")),
|
||||
baseUrl: stringField(item, "baseUrl", path).replace(/\/+$/u, ""),
|
||||
healthPath: optionalStringField(item, "healthPath", "/health"),
|
||||
healthMode: normalizeHealthMode(optionalStringField(item, "healthMode", "service-proxy")),
|
||||
};
|
||||
}
|
||||
|
||||
@@ -244,12 +252,12 @@ function readConfig(): RuntimeConfig {
|
||||
port: envNumber("PORT", 4266),
|
||||
logFile: envString("LOG_FILE", "/var/log/unidesk/v3sctl-adapter.jsonl"),
|
||||
manifestPaths: paths,
|
||||
clusterId: envString("V3SCTL_CLUSTER_ID", "D601"),
|
||||
clusterId: envString("V3SCTL_CLUSTER_ID", "unidesk-v8s"),
|
||||
nodeId: envString("V3SCTL_NODE_ID", "D601"),
|
||||
kubectlEnabled: envBool("V3SCTL_KUBECTL_ENABLED", false),
|
||||
kubectlContext: envString("V3SCTL_KUBECTL_CONTEXT", ""),
|
||||
kubeApiProxyEnabled: envBool("V3SCTL_KUBE_API_PROXY_ENABLED", true),
|
||||
kubeconfigPath: envString("V3SCTL_KUBECONFIG_PATH", "/var/lib/unidesk/v3s/kubeconfig"),
|
||||
kubeconfigPath: envString("V3SCTL_KUBECONFIG_PATH", "/var/lib/unidesk/v8s/kubeconfig"),
|
||||
kubeApiConnectHost: envString("V3SCTL_KUBE_API_CONNECT_HOST", "host.docker.internal"),
|
||||
requestTimeoutMs: Math.max(1000, Math.min(120_000, envNumber("V3SCTL_REQUEST_TIMEOUT_MS", 30_000))),
|
||||
healthTimeoutMs: Math.max(500, Math.min(30_000, envNumber("V3SCTL_HEALTH_TIMEOUT_MS", 2500))),
|
||||
@@ -385,6 +393,23 @@ function serviceProxyApiPath(service: ManagedService, targetPath: string): strin
|
||||
return `/api/v1/namespaces/${encodeURIComponent(service.namespace)}/services/${encodeURIComponent(`${serviceName}:${servicePort}`)}/proxy${safeTargetPath}`;
|
||||
}
|
||||
|
||||
function endpointProxyApiPath(service: ManagedService, endpoint: ManagedEndpoint, targetPath: string): string {
|
||||
const { namespace, serviceRef } = kubernetesEndpointServiceRef(service, endpoint);
|
||||
const safeTargetPath = targetPath.startsWith("/") ? targetPath : `/${targetPath}`;
|
||||
return `/api/v1/namespaces/${encodeURIComponent(namespace)}/services/${encodeURIComponent(serviceRef)}/proxy${safeTargetPath}`;
|
||||
}
|
||||
|
||||
function kubernetesEndpointServiceRef(service: ManagedService, endpoint: ManagedEndpoint): { namespace: string; serviceRef: string } {
|
||||
const base = new URL(endpoint.baseUrl);
|
||||
if (base.protocol !== "kubernetes:") throw new Error(`endpoint ${endpoint.id} must use kubernetes:// baseUrl`);
|
||||
const namespace = base.hostname || service.namespace;
|
||||
const parts = base.pathname.split("/").filter(Boolean);
|
||||
if (parts.length !== 2 || parts[0] !== "services" || parts[1].length === 0) {
|
||||
throw new Error(`endpoint ${endpoint.id} baseUrl must be kubernetes://<namespace>/services/<service>:<port>`);
|
||||
}
|
||||
return { namespace, serviceRef: parts[1] };
|
||||
}
|
||||
|
||||
function kubeProxyCurlArgs(client: KubeApiClient, method: string, url: URL, headers: Headers, hasBody: boolean, timeoutMs: number): string[] {
|
||||
const args = [
|
||||
"-sS",
|
||||
@@ -431,11 +456,32 @@ async function kubeApiServiceProxyResponse(
|
||||
targetPath: string,
|
||||
query: string,
|
||||
timeoutMs: number,
|
||||
): Promise<Response> {
|
||||
return kubeApiProxyResponse(service, req, serviceProxyApiPath(service, targetPath), query, timeoutMs);
|
||||
}
|
||||
|
||||
async function kubeApiEndpointProxyResponse(
|
||||
service: ManagedService,
|
||||
endpoint: ManagedEndpoint,
|
||||
req: Request,
|
||||
targetPath: string,
|
||||
query: string,
|
||||
timeoutMs: number,
|
||||
): Promise<Response> {
|
||||
return kubeApiProxyResponse(service, req, endpointProxyApiPath(service, endpoint, targetPath), query, timeoutMs);
|
||||
}
|
||||
|
||||
async function kubeApiProxyResponse(
|
||||
service: ManagedService,
|
||||
req: Request,
|
||||
apiPath: string,
|
||||
query: string,
|
||||
timeoutMs: number,
|
||||
): Promise<Response> {
|
||||
if (kubeClient === null) {
|
||||
return jsonResponse({ ok: false, error: "kubernetes api proxy is not configured", serviceId: service.id, kubeconfigPath: config.kubeconfigPath, noFallback: true }, 502);
|
||||
}
|
||||
const upstreamUrl = new URL(serviceProxyApiPath(service, targetPath), kubeClient.serverUrl);
|
||||
const upstreamUrl = new URL(apiPath, kubeClient.serverUrl);
|
||||
upstreamUrl.search = query;
|
||||
const headers = forwardHeaders(req);
|
||||
const bodyText = req.method === "GET" || req.method === "HEAD" ? "" : await req.text();
|
||||
@@ -455,7 +501,7 @@ async function kubeApiServiceProxyResponse(
|
||||
proc.exited,
|
||||
]);
|
||||
if (exitCode !== 0) {
|
||||
log("error", "kube_api_proxy_failed", { serviceId: service.id, targetPath, exitCode, stderr: stderr.slice(0, 2000), noFallback: true });
|
||||
log("error", "kube_api_proxy_failed", { serviceId: service.id, apiPath, exitCode, stderr: stderr.slice(0, 2000), noFallback: true });
|
||||
return jsonResponse({ ok: false, error: "kubernetes api service proxy failed", serviceId: service.id, detail: stderr.slice(0, 4000), noFallback: true }, 502);
|
||||
}
|
||||
const parsed = parseCurlHeaderBody(Buffer.from(stdout));
|
||||
@@ -522,14 +568,28 @@ async function probeEndpoint(endpoint: ManagedEndpoint): Promise<JsonRecord> {
|
||||
|
||||
async function probeKubernetesServiceActive(service: ManagedService): Promise<JsonRecord> {
|
||||
const endpoint = activeEndpoint(service);
|
||||
return probeKubernetesEndpoint(service, endpoint, true);
|
||||
}
|
||||
|
||||
async function probeKubernetesEndpoint(service: ManagedService, endpoint: ManagedEndpoint, active = false): Promise<JsonRecord> {
|
||||
if (!active && endpoint.healthMode === "pod-ready") return await probeKubernetesPodReady(service, endpoint);
|
||||
const checkedAt = new Date().toISOString();
|
||||
const response = await kubeApiServiceProxyResponse(
|
||||
service,
|
||||
new Request("http://v3sctl-adapter.local/health", { method: "GET", headers: { accept: "application/json" } }),
|
||||
endpoint.healthPath,
|
||||
"",
|
||||
config.healthTimeoutMs,
|
||||
);
|
||||
const response = active
|
||||
? await kubeApiServiceProxyResponse(
|
||||
service,
|
||||
new Request("http://v3sctl-adapter.local/health", { method: "GET", headers: { accept: "application/json" } }),
|
||||
endpoint.healthPath,
|
||||
"",
|
||||
config.healthTimeoutMs,
|
||||
)
|
||||
: await kubeApiEndpointProxyResponse(
|
||||
service,
|
||||
endpoint,
|
||||
new Request("http://v3sctl-adapter.local/health", { method: "GET", headers: { accept: "application/json" } }),
|
||||
endpoint.healthPath,
|
||||
"",
|
||||
config.healthTimeoutMs,
|
||||
);
|
||||
const contentType = response.headers.get("content-type") ?? "application/octet-stream";
|
||||
const bodyText = await response.text();
|
||||
let body: JsonValue = bodyText.slice(0, 2000);
|
||||
@@ -544,6 +604,7 @@ async function probeKubernetesServiceActive(service: ManagedService): Promise<Js
|
||||
role: endpoint.role,
|
||||
baseUrl: endpoint.baseUrl,
|
||||
healthPath: endpoint.healthPath,
|
||||
healthMode: endpoint.healthMode,
|
||||
proxyMode: "kubernetes-api-service-proxy",
|
||||
route: service.route,
|
||||
healthy: response.ok,
|
||||
@@ -555,9 +616,79 @@ async function probeKubernetesServiceActive(service: ManagedService): Promise<Js
|
||||
};
|
||||
}
|
||||
|
||||
function jsonAtPath(value: unknown, path: string): unknown {
|
||||
return path.split(".").reduce((current, key) => {
|
||||
if (typeof current !== "object" || current === null) return undefined;
|
||||
return (current as Record<string, unknown>)[key];
|
||||
}, value);
|
||||
}
|
||||
|
||||
function podReady(item: unknown): boolean {
|
||||
const conditions = jsonAtPath(item, "status.conditions");
|
||||
return Array.isArray(conditions) && conditions.some((condition) => {
|
||||
const record = typeof condition === "object" && condition !== null ? condition as Record<string, unknown> : {};
|
||||
return record.type === "Ready" && record.status === "True";
|
||||
});
|
||||
}
|
||||
|
||||
function podSummary(item: unknown): JsonRecord {
|
||||
const metadata = typeof jsonAtPath(item, "metadata") === "object" && jsonAtPath(item, "metadata") !== null ? jsonAtPath(item, "metadata") as Record<string, unknown> : {};
|
||||
return {
|
||||
name: typeof metadata.name === "string" ? metadata.name : "",
|
||||
nodeName: typeof jsonAtPath(item, "spec.nodeName") === "string" ? jsonAtPath(item, "spec.nodeName") as string : "",
|
||||
phase: typeof jsonAtPath(item, "status.phase") === "string" ? jsonAtPath(item, "status.phase") as string : "",
|
||||
podIP: typeof jsonAtPath(item, "status.podIP") === "string" ? jsonAtPath(item, "status.podIP") as string : "",
|
||||
ready: podReady(item),
|
||||
};
|
||||
}
|
||||
|
||||
async function probeKubernetesPodReady(service: ManagedService, endpoint: ManagedEndpoint): Promise<JsonRecord> {
|
||||
const checkedAt = new Date().toISOString();
|
||||
const { namespace } = kubernetesEndpointServiceRef(service, endpoint);
|
||||
const labelSelector = new URLSearchParams({
|
||||
labelSelector: `app.kubernetes.io/name=${service.id},unidesk.ai/instance-id=${endpoint.id}`,
|
||||
}).toString();
|
||||
const response = await kubeApiProxyResponse(
|
||||
service,
|
||||
new Request("http://v3sctl-adapter.local/api/pods", { method: "GET", headers: { accept: "application/json" } }),
|
||||
`/api/v1/namespaces/${encodeURIComponent(namespace)}/pods`,
|
||||
`?${labelSelector}`,
|
||||
config.healthTimeoutMs,
|
||||
);
|
||||
const contentType = response.headers.get("content-type") ?? "application/octet-stream";
|
||||
const bodyText = await response.text();
|
||||
let body: JsonValue = bodyText.slice(0, 2000);
|
||||
let pods: JsonRecord[] = [];
|
||||
try {
|
||||
const parsed = JSON.parse(bodyText) as JsonRecord;
|
||||
const items = Array.isArray(parsed.items) ? parsed.items : [];
|
||||
pods = items.map(podSummary);
|
||||
body = { itemCount: items.length, pods };
|
||||
} catch {
|
||||
// Keep the raw text preview below.
|
||||
}
|
||||
const healthy = response.ok && pods.some((pod) => pod.ready === true);
|
||||
return {
|
||||
id: endpoint.id,
|
||||
nodeId: endpoint.nodeId,
|
||||
role: endpoint.role,
|
||||
baseUrl: endpoint.baseUrl,
|
||||
healthPath: endpoint.healthPath,
|
||||
healthMode: endpoint.healthMode,
|
||||
proxyMode: "kubernetes-api-pod-readiness",
|
||||
route: service.route,
|
||||
healthy,
|
||||
status: healthy ? "healthy" : "unhealthy",
|
||||
upstreamStatus: response.status,
|
||||
contentType,
|
||||
checkedAt,
|
||||
body,
|
||||
};
|
||||
}
|
||||
|
||||
async function serviceStatus(service: ManagedService): Promise<JsonRecord> {
|
||||
const instances = isKubernetesServiceRoute(service)
|
||||
? [await probeKubernetesServiceActive(service)]
|
||||
? await Promise.all(service.endpoints.map((endpoint) => endpoint.id === service.activeInstanceId ? probeKubernetesServiceActive(service) : probeKubernetesEndpoint(service, endpoint)))
|
||||
: [{
|
||||
id: service.activeInstanceId,
|
||||
nodeId: activeEndpoint(service).nodeId,
|
||||
@@ -576,7 +707,7 @@ async function serviceStatus(service: ManagedService): Promise<JsonRecord> {
|
||||
const activeHealthy = active?.healthy === true;
|
||||
const allInstancesHealthy = instances.every((item) => item.healthy === true);
|
||||
const expectedNodeIds = service.expectedNodeIds;
|
||||
const presentNodeIds = Array.from(new Set(instances.map((item) => String(item.nodeId))));
|
||||
const presentNodeIds = Array.from(new Set(instances.filter((item) => item.healthy === true).map((item) => String(item.nodeId))));
|
||||
const missingNodeIds = expectedNodeIds.filter((nodeId) => !presentNodeIds.includes(nodeId));
|
||||
const topologyComplete = missingNodeIds.length === 0;
|
||||
const requiredTopologyHealthy = !service.requireAllInstancesHealthy || (topologyComplete && allInstancesHealthy);
|
||||
|
||||
@@ -4,7 +4,43 @@ metadata:
|
||||
name: unidesk
|
||||
labels:
|
||||
app.kubernetes.io/part-of: unidesk
|
||||
unidesk.ai/v3s-cluster: unidesk-v3s
|
||||
unidesk.ai/v3s-cluster: unidesk-v8s
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: d601-provider-egress-proxy
|
||||
namespace: unidesk
|
||||
labels:
|
||||
app.kubernetes.io/name: provider-egress-proxy
|
||||
app.kubernetes.io/part-of: unidesk
|
||||
unidesk.ai/provider-id: D601
|
||||
spec:
|
||||
type: ClusterIP
|
||||
ports:
|
||||
- name: http
|
||||
port: 18789
|
||||
targetPort: 18789
|
||||
protocol: TCP
|
||||
---
|
||||
apiVersion: discovery.k8s.io/v1
|
||||
kind: EndpointSlice
|
||||
metadata:
|
||||
name: d601-provider-egress-proxy
|
||||
namespace: unidesk
|
||||
labels:
|
||||
kubernetes.io/service-name: d601-provider-egress-proxy
|
||||
app.kubernetes.io/name: provider-egress-proxy
|
||||
app.kubernetes.io/part-of: unidesk
|
||||
unidesk.ai/provider-id: D601
|
||||
addressType: IPv4
|
||||
ports:
|
||||
- name: http
|
||||
protocol: TCP
|
||||
port: 18789
|
||||
endpoints:
|
||||
- addresses:
|
||||
- "172.25.0.3"
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
@@ -31,6 +67,8 @@ spec:
|
||||
unidesk.ai/instance-id: D601
|
||||
unidesk.ai/node-id: D601
|
||||
spec:
|
||||
nodeSelector:
|
||||
unidesk.ai/node-id: D601
|
||||
terminationGracePeriodSeconds: 30
|
||||
containers:
|
||||
- name: code-queue
|
||||
@@ -99,25 +137,25 @@ spec:
|
||||
- name: CODE_QUEUE_EGRESS_PROXY_ENABLED
|
||||
value: "true"
|
||||
- name: CODE_QUEUE_EGRESS_PROXY_URL
|
||||
value: "http://host.docker.internal:18789"
|
||||
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
|
||||
- name: CODE_QUEUE_EGRESS_PROXY_NO_PROXY
|
||||
value: "localhost,127.0.0.1,::1,host.docker.internal,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
|
||||
value: "localhost,127.0.0.1,::1,host.docker.internal,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,172.25.0.3,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
|
||||
- name: HTTP_PROXY
|
||||
value: "http://host.docker.internal:18789"
|
||||
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
|
||||
- name: HTTPS_PROXY
|
||||
value: "http://host.docker.internal:18789"
|
||||
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
|
||||
- name: ALL_PROXY
|
||||
value: "http://host.docker.internal:18789"
|
||||
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
|
||||
- name: http_proxy
|
||||
value: "http://host.docker.internal:18789"
|
||||
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
|
||||
- name: https_proxy
|
||||
value: "http://host.docker.internal:18789"
|
||||
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
|
||||
- name: all_proxy
|
||||
value: "http://host.docker.internal:18789"
|
||||
value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
|
||||
- name: NO_PROXY
|
||||
value: "localhost,127.0.0.1,::1,host.docker.internal,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
|
||||
value: "localhost,127.0.0.1,::1,host.docker.internal,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,172.25.0.3,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
|
||||
- name: no_proxy
|
||||
value: "localhost,127.0.0.1,::1,host.docker.internal,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
|
||||
value: "localhost,127.0.0.1,::1,host.docker.internal,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,172.25.0.3,unidesk-provider-gateway-D601,74.48.78.17,backend-core,oa-event-flow,database"
|
||||
- name: OA_EVENT_FLOW_BASE_URL
|
||||
value: "http://74.48.78.17:4255"
|
||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_ENABLED
|
||||
@@ -226,3 +264,228 @@ spec:
|
||||
- name: http
|
||||
port: 4222
|
||||
targetPort: http
|
||||
|
||||
---
|
||||
apiVersion: apps/v1
|
||||
kind: Deployment
|
||||
metadata:
|
||||
name: code-queue-d518
|
||||
namespace: unidesk
|
||||
labels:
|
||||
app.kubernetes.io/name: code-queue
|
||||
app.kubernetes.io/part-of: unidesk
|
||||
unidesk.ai/deployment-mode: v3sctl-managed
|
||||
unidesk.ai/instance-id: D518
|
||||
spec:
|
||||
replicas: 1
|
||||
selector:
|
||||
matchLabels:
|
||||
app.kubernetes.io/name: code-queue
|
||||
unidesk.ai/instance-id: D518
|
||||
template:
|
||||
metadata:
|
||||
labels:
|
||||
app.kubernetes.io/name: code-queue
|
||||
app.kubernetes.io/part-of: unidesk
|
||||
unidesk.ai/deployment-mode: v3sctl-managed
|
||||
unidesk.ai/instance-id: D518
|
||||
unidesk.ai/node-id: D518
|
||||
spec:
|
||||
nodeSelector:
|
||||
unidesk.ai/node-id: D518
|
||||
terminationGracePeriodSeconds: 30
|
||||
containers:
|
||||
- name: code-queue
|
||||
image: unidesk-code-queue:d601
|
||||
imagePullPolicy: IfNotPresent
|
||||
ports:
|
||||
- name: http
|
||||
containerPort: 4222
|
||||
envFrom:
|
||||
- secretRef:
|
||||
name: code-queue-env
|
||||
optional: true
|
||||
env:
|
||||
- name: HOST
|
||||
value: "0.0.0.0"
|
||||
- name: PORT
|
||||
value: "4222"
|
||||
- name: CODE_QUEUE_INSTANCE_ID
|
||||
value: "D518"
|
||||
- name: CODE_QUEUE_SCHEDULER_ENABLED
|
||||
value: "false"
|
||||
- name: CODE_QUEUE_STARTUP_OA_BACKFILL_ENABLED
|
||||
value: "false"
|
||||
- name: CODE_QUEUE_DATA_DIR
|
||||
value: "/var/lib/unidesk/code-queue"
|
||||
- name: CODE_QUEUE_WORKDIR
|
||||
value: "/workspace"
|
||||
- name: CODE_QUEUE_CODEX_HOME
|
||||
value: "/var/lib/unidesk/code-queue/codex-home"
|
||||
- name: CODE_QUEUE_OPENCODE_XDG_DIR
|
||||
value: "/var/lib/unidesk/code-queue/opencode-xdg"
|
||||
- name: CODE_QUEUE_SOURCE_CODEX_CONFIG
|
||||
value: "/root/.codex/config.toml"
|
||||
- name: CODE_QUEUE_DEFAULT_MODEL
|
||||
value: "gpt-5.5"
|
||||
- name: CODE_QUEUE_MODELS
|
||||
value: "gpt-5.5,gpt-5.4-mini,gpt-5.4,minimax-m2.7"
|
||||
- name: CODE_QUEUE_MODEL_REASONING_EFFORTS
|
||||
value: "gpt-5.5=xhigh"
|
||||
- name: CODE_QUEUE_SANDBOX
|
||||
value: "danger-full-access"
|
||||
- name: CODE_QUEUE_APPROVAL_POLICY
|
||||
value: "never"
|
||||
- name: CODE_QUEUE_MAX_ACTIVE_QUEUES
|
||||
value: "0"
|
||||
- name: CODE_QUEUE_DATABASE_POOL_MAX
|
||||
value: "2"
|
||||
- name: NODE_OPTIONS
|
||||
value: "--max-old-space-size=1024"
|
||||
- name: CODE_QUEUE_IN_MEMORY_OUTPUT_RECORDS
|
||||
value: "10"
|
||||
- name: CODE_QUEUE_IN_MEMORY_EVENT_RECORDS
|
||||
value: "10"
|
||||
- name: CODE_QUEUE_MAIN_PROVIDER_ID
|
||||
value: "D518"
|
||||
- name: CODE_QUEUE_REMOTE_WORKDIR
|
||||
value: "/home/ubuntu"
|
||||
- name: CODE_QUEUE_EXECUTION_PROVIDER_IDS
|
||||
value: "D518"
|
||||
- name: CODE_QUEUE_DEV_CONTAINER_MASTER_HOST
|
||||
value: "74.48.78.17"
|
||||
- name: CODE_QUEUE_DEV_CONTAINER_DEFAULT_PROVIDER_ID
|
||||
value: "D518"
|
||||
- name: CODE_QUEUE_DEV_CONTAINER_WORKDIR
|
||||
value: "/home/ubuntu"
|
||||
- name: CODE_QUEUE_EGRESS_PROXY_ENABLED
|
||||
value: "false"
|
||||
- name: CODE_QUEUE_EGRESS_PROXY_URL
|
||||
value: ""
|
||||
- name: CODE_QUEUE_EGRESS_PROXY_NO_PROXY
|
||||
value: "localhost,127.0.0.1,::1,host.docker.internal,74.48.78.17,backend-core,oa-event-flow,database"
|
||||
- name: HTTP_PROXY
|
||||
value: ""
|
||||
- name: HTTPS_PROXY
|
||||
value: ""
|
||||
- name: ALL_PROXY
|
||||
value: ""
|
||||
- name: http_proxy
|
||||
value: ""
|
||||
- name: https_proxy
|
||||
value: ""
|
||||
- name: all_proxy
|
||||
value: ""
|
||||
- name: NO_PROXY
|
||||
value: "localhost,127.0.0.1,::1,host.docker.internal,74.48.78.17,backend-core,oa-event-flow,database"
|
||||
- name: no_proxy
|
||||
value: "localhost,127.0.0.1,::1,host.docker.internal,74.48.78.17,backend-core,oa-event-flow,database"
|
||||
- name: OA_EVENT_FLOW_BASE_URL
|
||||
value: "http://74.48.78.17:4255"
|
||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_ENABLED
|
||||
value: "false"
|
||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_BASE_URL
|
||||
value: ""
|
||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_TARGET_TYPE
|
||||
value: "private"
|
||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_USER_ID
|
||||
value: "645275593"
|
||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_MAX_RESPONSE_CHARS
|
||||
value: "12000"
|
||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_TIMEOUT_MS
|
||||
value: "15000"
|
||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_SEND_ATTEMPTS
|
||||
value: "3"
|
||||
- name: LOG_FILE
|
||||
value: "/var/log/unidesk/code-queue-d518.jsonl"
|
||||
- name: UNIDESK_LOG_RETENTION_BYTES
|
||||
value: "1GiB"
|
||||
volumeMounts:
|
||||
- name: docker-sock
|
||||
mountPath: /var/run/docker.sock
|
||||
- name: workspace
|
||||
mountPath: /workspace
|
||||
- name: workspace
|
||||
mountPath: /root/unidesk
|
||||
- name: codex-config
|
||||
mountPath: /root/.codex/config.toml
|
||||
readOnly: true
|
||||
- name: ssh-dir
|
||||
mountPath: /root/.ssh
|
||||
readOnly: true
|
||||
- name: logs
|
||||
mountPath: /var/log/unidesk
|
||||
- name: state
|
||||
mountPath: /var/lib/unidesk/code-queue
|
||||
readinessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: http
|
||||
periodSeconds: 5
|
||||
timeoutSeconds: 3
|
||||
failureThreshold: 20
|
||||
livenessProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: http
|
||||
periodSeconds: 10
|
||||
timeoutSeconds: 3
|
||||
failureThreshold: 6
|
||||
startupProbe:
|
||||
httpGet:
|
||||
path: /health
|
||||
port: http
|
||||
periodSeconds: 5
|
||||
timeoutSeconds: 3
|
||||
failureThreshold: 60
|
||||
resources:
|
||||
requests:
|
||||
cpu: 250m
|
||||
memory: 512Mi
|
||||
limits:
|
||||
memory: 4Gi
|
||||
volumes:
|
||||
- name: docker-sock
|
||||
hostPath:
|
||||
path: /var/run/docker.sock
|
||||
type: Socket
|
||||
- name: workspace
|
||||
hostPath:
|
||||
path: /home/ubuntu/cq-deploy
|
||||
type: Directory
|
||||
- name: codex-config
|
||||
hostPath:
|
||||
path: /home/ubuntu/.codex/config.toml
|
||||
type: File
|
||||
- name: ssh-dir
|
||||
hostPath:
|
||||
path: /home/ubuntu/.ssh
|
||||
type: Directory
|
||||
- name: logs
|
||||
hostPath:
|
||||
path: /home/ubuntu/cq-deploy/.state/code-queue/logs
|
||||
type: DirectoryOrCreate
|
||||
- name: state
|
||||
hostPath:
|
||||
path: /home/ubuntu/cq-deploy/.state/code-queue
|
||||
type: DirectoryOrCreate
|
||||
---
|
||||
apiVersion: v1
|
||||
kind: Service
|
||||
metadata:
|
||||
name: code-queue-d518
|
||||
namespace: unidesk
|
||||
labels:
|
||||
app.kubernetes.io/name: code-queue
|
||||
app.kubernetes.io/part-of: unidesk
|
||||
unidesk.ai/deployment-mode: v3sctl-managed
|
||||
unidesk.ai/instance-id: D518
|
||||
spec:
|
||||
type: ClusterIP
|
||||
selector:
|
||||
app.kubernetes.io/name: code-queue
|
||||
unidesk.ai/instance-id: D518
|
||||
ports:
|
||||
- name: http
|
||||
port: 4222
|
||||
targetPort: http
|
||||
|
||||
@@ -9,8 +9,8 @@
|
||||
"adapterServiceId": "v3sctl-adapter",
|
||||
"controlPlane": {
|
||||
"type": "kubernetes",
|
||||
"cluster": "unidesk-v3s",
|
||||
"context": "kind-unidesk-v3s"
|
||||
"cluster": "unidesk-v8s",
|
||||
"context": "unidesk-v8s"
|
||||
},
|
||||
"route": {
|
||||
"kind": "kubernetes-service",
|
||||
@@ -29,7 +29,16 @@
|
||||
"nodeId": "D601",
|
||||
"role": "primary",
|
||||
"baseUrl": "kubernetes://unidesk/services/code-queue:4222",
|
||||
"healthPath": "/health"
|
||||
"healthPath": "/health",
|
||||
"healthMode": "service-proxy"
|
||||
},
|
||||
{
|
||||
"id": "D518",
|
||||
"nodeId": "D518",
|
||||
"role": "standby",
|
||||
"baseUrl": "kubernetes://unidesk/services/code-queue-d518:4222",
|
||||
"healthPath": "/health",
|
||||
"healthMode": "pod-ready"
|
||||
}
|
||||
],
|
||||
"requireAllInstancesHealthy": false
|
||||
|
||||
Reference in New Issue
Block a user