fix: restore code queue native topology
This commit is contained in:
+3
-4
@@ -544,7 +544,7 @@
|
|||||||
"id": "k3sctl-adapter",
|
"id": "k3sctl-adapter",
|
||||||
"name": "k3s Control Plane",
|
"name": "k3s Control Plane",
|
||||||
"providerId": "D601",
|
"providerId": "D601",
|
||||||
"description": "k3sctl-adapter 是 UniDesk 直管的 k3s 控制平面适配微服务,连接 D601 原生 k3s/k3sctl 控制平面,并通过 k3s 标准服务路由代理 D601/D518 等节点上的代管微服务。",
|
"description": "k3sctl-adapter 是 UniDesk 直管的 k3s 控制平面适配微服务,连接 D601 原生 k3s/k3sctl 控制平面,并通过 k3s 标准服务路由代理 D601 上的代管微服务。",
|
||||||
"repository": {
|
"repository": {
|
||||||
"url": "https://github.com/pikasTech/unidesk",
|
"url": "https://github.com/pikasTech/unidesk",
|
||||||
"commitId": "69ff373c81e97faddd5641d36481502589be767f",
|
"commitId": "69ff373c81e97faddd5641d36481502589be767f",
|
||||||
@@ -592,7 +592,7 @@
|
|||||||
"id": "code-queue",
|
"id": "code-queue",
|
||||||
"name": "Code Queue",
|
"name": "Code Queue",
|
||||||
"providerId": "D601",
|
"providerId": "D601",
|
||||||
"description": "Code Queue 是由 D601 k3s 控制平面代管的代码代理队列用户服务,UniDesk 只通过 k3sctl-adapter 访问其标准服务路由;D601/D518 实例由 k3s 管理,provider-gateway 只保留维护用途。",
|
"description": "Code Queue 是由 D601 k3s 控制平面代管的代码代理队列用户服务,UniDesk 只通过 k3sctl-adapter 访问其标准服务路由;当前运行拓扑固定为 D601 原生 k3s 内的 read/write/scheduler 多服务,provider-gateway 只保留维护用途。",
|
||||||
"repository": {
|
"repository": {
|
||||||
"url": "https://github.com/pikasTech/unidesk",
|
"url": "https://github.com/pikasTech/unidesk",
|
||||||
"commitId": "69ff373c81e97faddd5641d36481502589be767f",
|
"commitId": "69ff373c81e97faddd5641d36481502589be767f",
|
||||||
@@ -638,8 +638,7 @@
|
|||||||
"k3sServiceId": "code-queue",
|
"k3sServiceId": "code-queue",
|
||||||
"namespace": "unidesk",
|
"namespace": "unidesk",
|
||||||
"expectedNodeIds": [
|
"expectedNodeIds": [
|
||||||
"D601",
|
"D601"
|
||||||
"D518"
|
|
||||||
],
|
],
|
||||||
"activeNodeId": "D601"
|
"activeNodeId": "D601"
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -59,6 +59,7 @@
|
|||||||
- Running k3s itself inside Docker or any long-lived container is forbidden. Docker remains available for provider-gateway, target-side image builds, user workload containers, and temporary artifact extraction, but not as the k3s runtime boundary.
|
- Running k3s itself inside Docker or any long-lived container is forbidden. Docker remains available for provider-gateway, target-side image builds, user workload containers, and temporary artifact extraction, but not as the k3s runtime boundary.
|
||||||
- Kubernetes hostPath, local-path storage, node labels, kubelet, CNI, and `/workspace` semantics must resolve against the real computing node filesystem. For WSL nodes such as D601, a Code Queue task working in `/workspace` must see the WSL host `/home/ubuntu`, not a container-private `/home/ubuntu`.
|
- Kubernetes hostPath, local-path storage, node labels, kubelet, CNI, and `/workspace` semantics must resolve against the real computing node filesystem. For WSL nodes such as D601, a Code Queue task working in `/workspace` must see the WSL host `/home/ubuntu`, not a container-private `/home/ubuntu`.
|
||||||
- If a legacy `rancher/k3s` container is present during migration, it is only an artifact source or rollback reference; it must not remain the active control plane and must be stopped before accepting the node as healthy.
|
- If a legacy `rancher/k3s` container is present during migration, it is only an artifact source or rollback reference; it must not remain the active control plane and must be stopped before accepting the node as healthy.
|
||||||
|
- A computing resource node that cannot run native k3s and reach the k3s control plane through a stable Kubernetes-supported network must not be listed as an expected k3s node. Current Code Queue HA is provided inside the D601 native k3s control plane through separate read, write, and scheduler Services; D518 remains a normal UniDesk provider/File Browser node until a native k3s-agent network path is designed and verified.
|
||||||
- k3s Control Bridge Boundary
|
- k3s Control Bridge Boundary
|
||||||
- `k3sctl-adapter` is part of the UniDesk control plane, not a workload controlled by k3s. It must remain `deployment.mode=unidesk-direct` or an equivalent UniDesk-managed host service, and must not be converted to `k3sctl-managed`.
|
- `k3sctl-adapter` is part of the UniDesk control plane, not a workload controlled by k3s. It must remain `deployment.mode=unidesk-direct` or an equivalent UniDesk-managed host service, and must not be converted to `k3sctl-managed`.
|
||||||
- The adapter exists so UniDesk can inspect, deploy, and repair k3s-managed user services. Putting that bridge inside the k3s cluster would invert the dependency order: repairing or diagnosing k3s would first require the in-cluster adapter and service network to be healthy.
|
- The adapter exists so UniDesk can inspect, deploy, and repair k3s-managed user services. Putting that bridge inside the k3s cluster would invert the dependency order: repairing or diagnosing k3s would first require the in-cluster adapter and service network to be healthy.
|
||||||
|
|||||||
@@ -10,7 +10,7 @@
|
|||||||
- `provider-gateway` 是当前主 server 的本机计算节点代理,通过 WebSocket 主动连到 provider ingress,挂载 `/var/run/docker.sock` 作为自动任务执行主路径,使用 `pid: "host"` 读取节点级进程资源,并周期性上报系统资源指标、进程占用与 Docker daemon 状态;计算节点 provider-gateway 还必须把 egress proxy 仅发布到宿主 loopback `127.0.0.1:18789` 供本节点执行环境出网,维护用 Host SSH / WSL SSH 私钥目录只读挂载到 `/run/host-ssh`,不得作为自动任务调度主路径。
|
- `provider-gateway` 是当前主 server 的本机计算节点代理,通过 WebSocket 主动连到 provider ingress,挂载 `/var/run/docker.sock` 作为自动任务执行主路径,使用 `pid: "host"` 读取节点级进程资源,并周期性上报系统资源指标、进程占用与 Docker daemon 状态;计算节点 provider-gateway 还必须把 egress proxy 仅发布到宿主 loopback `127.0.0.1:18789` 供本节点执行环境出网,维护用 Host SSH / WSL SSH 私钥目录只读挂载到 `/run/host-ssh`,不得作为自动任务调度主路径。
|
||||||
- `todo-note` 是主 server 承载的 Todo Note 纯后端用户服务,容器名 `todo-note-backend`,只在 Compose 内网暴露 `4211/tcp`,使用主 PostgreSQL 存储迁移后的 Todo Note 数据。
|
- `todo-note` 是主 server 承载的 Todo Note 纯后端用户服务,容器名 `todo-note-backend`,只在 Compose 内网暴露 `4211/tcp`,使用主 PostgreSQL 存储迁移后的 Todo Note 数据。
|
||||||
- `k3sctl-adapter` 是 D601 上由 UniDesk 直管的 k3s 控制面适配微服务,容器名 `k3sctl-adapter`,只绑定 `127.0.0.1:4266`,由 UniDesk frontend/backend-core 通过用户服务代理访问并提供 `/api/control-plane` 可见性。
|
- `k3sctl-adapter` 是 D601 上由 UniDesk 直管的 k3s 控制面适配微服务,容器名 `k3sctl-adapter`,只绑定 `127.0.0.1:4266`,由 UniDesk frontend/backend-core 通过用户服务代理访问并提供 `/api/control-plane` 可见性。
|
||||||
- `code-queue` 是由 D601 k3s/k8s 控制面代管的 Codex/OpenCode 队列用户服务,active/standby 和 read/write/scheduler 服务通过 `src/components/microservices/k3sctl-adapter/k3s/code-queue.k3s.json` 声明,运行对象通过 `code-queue.k8s.yaml` 创建 Kubernetes Deployment/ClusterIP Service;任务、queue、未读状态、控制状态和通知 outbox 一律写入主 PostgreSQL,不保留本地状态文件 fallback。浏览器只使用稳定的 `code-queue` 用户服务 ID,backend-core 在内部把只读请求分到 `code-queue-read`,把命令写入分到 `code-queue-write`,把执行端 `/health`、dev-container 和 running task control 分到 `code-queue-scheduler`。
|
- `code-queue` 是由 D601 原生 k3s/k8s 控制面代管的 Codex/OpenCode 队列用户服务,当前拓扑为 D601 内的 read/write/scheduler 多 Service,通过 `src/components/microservices/k3sctl-adapter/k3s/code-queue.k3s.json` 声明,运行对象通过 `code-queue.k8s.yaml` 创建 Kubernetes Deployment/ClusterIP Service;任务、queue、未读状态、控制状态和通知 outbox 一律写入主 PostgreSQL,不保留本地状态文件 fallback。浏览器只使用稳定的 `code-queue` 用户服务 ID,backend-core 在内部把只读请求分到 `code-queue-read`,把命令写入分到 `code-queue-write`,把执行端 `/health`、dev-container 和 running task control 分到 `code-queue-scheduler`。
|
||||||
- `project-manager` 是主 server 承载的项目管理用户服务,容器名 `project-manager-backend`,仅在 Compose 内网暴露 `4233/tcp`,项目清单写入主 PostgreSQL,浏览器只能通过 UniDesk frontend 同源代理执行增删改查、Excel 导入和 Excel 导出。
|
- `project-manager` 是主 server 承载的项目管理用户服务,容器名 `project-manager-backend`,仅在 Compose 内网暴露 `4233/tcp`,项目清单写入主 PostgreSQL,浏览器只能通过 UniDesk frontend 同源代理执行增删改查、Excel 导入和 Excel 导出。
|
||||||
- `baidu-netdisk` 是主 server 承载的百度网盘存储用户服务,容器名 `baidu-netdisk-backend`,仅在 Compose 内网暴露 `4244/tcp`,OAuth/token/transfer 状态写入主 PostgreSQL,浏览器只能通过 UniDesk frontend 同源代理执行设备码登录、文件浏览和 staging 传输任务控制。
|
- `baidu-netdisk` 是主 server 承载的百度网盘存储用户服务,容器名 `baidu-netdisk-backend`,仅在 Compose 内网暴露 `4244/tcp`,OAuth/token/transfer 状态写入主 PostgreSQL,浏览器只能通过 UniDesk frontend 同源代理执行设备码登录、文件浏览和 staging 传输任务控制。
|
||||||
|
|
||||||
@@ -52,7 +52,7 @@ frontend 的 Docker 上线顺序为:先运行必要的本地校验,例如 `b
|
|||||||
|
|
||||||
## Health Criteria
|
## Health Criteria
|
||||||
|
|
||||||
服务跑通的最低标准是:backend-core 内网 `/health` 返回 ok,frontend 公网 `/health` 返回 ok,provider ingress 公网 `/health` 返回 ok,database 在容器内 `pg_isready` 可用,Todo Note 后端 `/api/health` 返回 `storage=postgres`,`k3sctl-adapter` `/api/control-plane` 可见 `unidesk-k3s` Kubernetes API service proxy 状态、D601 scheduler serving healthy、D601 read/write Service healthy、D518 standby pod ready、`presentNodeIds=[D601,D518]`、`missingNodeIds=[]` 和 no-fallback 路径,Code Queue `/health` 经 k3s scheduler Service 返回轻量 readiness、默认模型、`queue.storage` 和 `egressProxy.connected=true`,`/api/tasks/overview` 经 `code-queue-read` 返回 PostgreSQL 队列总览,写入类入口经 `code-queue-write` 入库后由 `code-queue-scheduler` 轮询并执行,Project Manager `/health` 返回 `storage.primary=postgres` 和项目数量,backend-core `/api/performance` 返回性能指标,`/api/nodes` 中出现 `main-server`、`D601` 和 `D518` provider 且状态为 `online`,`/api/nodes/system-status` 中出现对应 CPU/内存/硬盘采样,`/api/nodes/docker-status` 中能看到主 server、D601 与 D518 Docker 快照。D518 必须通过原生 k3s agent 加入原生 k3s 控制面并运行 `code-queue-d518` standby Pod;不得用 Docker 化 k3s、D601->D518 直连、NodePort 或 provider-gateway business HTTP 绕过 Kubernetes service route。交付前还必须运行 `bun scripts/cli.ts e2e run`,并以 `docs/reference/e2e.md` 的门禁作为最终判定。
|
服务跑通的最低标准是:backend-core 内网 `/health` 返回 ok,frontend 公网 `/health` 返回 ok,provider ingress 公网 `/health` 返回 ok,database 在容器内 `pg_isready` 可用,Todo Note 后端 `/api/health` 返回 `storage=postgres`,`k3sctl-adapter` `/api/control-plane` 可见 `unidesk-k3s` Kubernetes API service proxy 状态、D601 scheduler serving healthy、D601 read/write Service healthy、`presentNodeIds` 包含 `D601`、`missingNodeIds=[]` 和 no-fallback 路径,Code Queue `/health` 经 k3s scheduler Service 返回轻量 readiness、默认模型、`queue.storage` 和 `egressProxy.connected=true`,`/api/tasks/overview` 经 `code-queue-read` 返回 PostgreSQL 队列总览,写入类入口经 `code-queue-write` 入库后由 `code-queue-scheduler` 轮询并执行,Project Manager `/health` 返回 `storage.primary=postgres` 和项目数量,backend-core `/api/performance` 返回性能指标,`/api/nodes` 中出现 `main-server`、`D601` 和 `D518` provider 且状态为 `online`,`/api/nodes/system-status` 中出现对应 CPU/内存/硬盘采样,`/api/nodes/docker-status` 中能看到主 server、D601 与 D518 Docker 快照。D601/D518 上不得存在 active `rancher/k3s` 容器;D518 只有在原生 k3s-agent 与稳定 Kubernetes 网络完成验证后才可加入 Code Queue expected nodes。交付前还必须运行 `bun scripts/cli.ts e2e run`,并以 `docs/reference/e2e.md` 的门禁作为最终判定。
|
||||||
|
|
||||||
## Code Queue D601 Resource Budget
|
## Code Queue D601 Resource Budget
|
||||||
|
|
||||||
|
|||||||
@@ -35,13 +35,13 @@ Typical targeted commands:
|
|||||||
- Core API: `docker exec unidesk-backend-core` calls internal `GET /api/overview`, which must report `dbReady: true`, `pgdata.volumeName=unidesk_pgdata_10gb`, a positive PostgreSQL database byte count, and at least one online node; internal `GET /api/performance` must report component request statistics, internal operation statistics, PGDATA usage and Code Queue PostgreSQL storage metadata.
|
- Core API: `docker exec unidesk-backend-core` calls internal `GET /api/overview`, which must report `dbReady: true`, `pgdata.volumeName=unidesk_pgdata_10gb`, a positive PostgreSQL database byte count, and at least one online node; internal `GET /api/performance` must report component request statistics, internal operation statistics, PGDATA usage and Code Queue PostgreSQL storage metadata.
|
||||||
- Provider self-connection: internal `GET /api/nodes` must contain `main-server` with `status: online`, `labels.providerGatewayVersion` equal to `src/components/provider-gateway/package.json`, `labels.providerGatewayUpgradePolicy: "always-enabled"`, `labels.providerGatewayRestartPolicyOk: true`, `labels.providerGatewayPidModeOk: true`, and `labels.providerGatewayRuntimeGuardOk: true`; internal `GET /api/nodes/system-status` must contain CPU/memory/disk samples plus a non-empty process resource list sorted by memory by default; internal `GET /api/nodes/docker-status` must contain a Docker snapshot for `main-server`; every running `provider-gateway` container visible in Docker snapshots must report `restartPolicy: "always"` and `pidMode: "host"`; public provider ingress `/health` must return ok.
|
- Provider self-connection: internal `GET /api/nodes` must contain `main-server` with `status: online`, `labels.providerGatewayVersion` equal to `src/components/provider-gateway/package.json`, `labels.providerGatewayUpgradePolicy: "always-enabled"`, `labels.providerGatewayRestartPolicyOk: true`, `labels.providerGatewayPidModeOk: true`, and `labels.providerGatewayRuntimeGuardOk: true`; internal `GET /api/nodes/system-status` must contain CPU/memory/disk samples plus a non-empty process resource list sorted by memory by default; internal `GET /api/nodes/docker-status` must contain a Docker snapshot for `main-server`; every running `provider-gateway` container visible in Docker snapshots must report `restartPolicy: "always"` and `pidMode: "host"`; public provider ingress `/health` must return ok.
|
||||||
- Provider remote control: internal `/api/dispatch` must successfully complete a real `provider.upgrade` task in `mode: "plan"` so the upgrade path is validated without recreating the running gateway during E2E.
|
- Provider remote control: internal `/api/dispatch` must successfully complete a real `provider.upgrade` task in `mode: "plan"` so the upgrade path is validated without recreating the running gateway during E2E.
|
||||||
- User services: internal `/api/microservices` must include `todo-note` and `oa-event-flow` on `main-server`, canonical `filebrowser` on `D518`, plus `k3sctl-adapter`, `code-queue`, `findjob`, `pipeline`, `met-nonlinear`, `claudeqq` and `filebrowser-d601` on `D601` with `public=false`; `/api/microservices/todo-note/health` must report `storage=postgres`, `/api/microservices/todo-note/proxy/api/instances` must expose the migrated Todo Note lists, and a temporary Todo Note list create/add/toggle/undo/delete cycle must succeed through the real provider-gateway proxy; `/api/microservices/oa-event-flow/health`, `/api/microservices/oa-event-flow/proxy/api/diagnostics`, `/api/microservices/oa-event-flow/proxy/api/events`, `/api/microservices/oa-event-flow/proxy/api/events?tags=service:pipeline` and `/api/microservices/oa-event-flow/proxy/api/stats/trace` must prove the independent OA event table、Pipeline bridge 和 stats center are reachable through UniDesk proxy; `/api/microservices/k3sctl-adapter/health` and `/api/microservices/k3sctl-adapter/proxy/api/control-plane` must expose the D601 `unidesk-k3s` control plane, `kubeApiProxy.mode=kubernetes-api-service-proxy`, D601 active instance `servingHealthy=true`, D518 standby instance `healthy=true`, `presentNodeIds=[D601,D518]`, `missingNodeIds=[]`, `status=healthy`, and `noFallback=true`; `/api/microservices/code-queue/health` must return the active Code Queue backend summary with default model `gpt-5.5`, `egressProxy.connected=true`, and `/api/microservices/code-queue/proxy/api/tasks/overview` must return queue state through backend-core -> k3sctl-adapter -> Kubernetes API service proxy -> k3s/k8s Service, not through a `serviceId=code-queue` provider-gateway direct task or `/api/code-queue-direct`; `/api/microservices/filebrowser/health`, `/api/microservices/filebrowser-d601/health` and `/api/microservices/filebrowser/proxy/` must prove File Browser health and WebUI access through UniDesk proxy; `/api/microservices/findjob/health` and `/api/microservices/findjob/proxy/api/summary` must succeed through the real provider-gateway proxy; `/api/microservices/findjob/proxy/api/jobs?__unideskArrayLimit=jobs:5` must return a bounded preview with `_unidesk.arrayLimits` metadata; `/api/microservices/pipeline/health`, `/api/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3` and `/api/microservices/pipeline/proxy/api/oa-event-flow/diagnostics` must return Pipeline health, registry/run previews and OA event-flow evidence; `/api/microservices/met-nonlinear/health`, `/api/microservices/met-nonlinear/proxy/api/queue`, `/api/microservices/met-nonlinear/proxy/api/projects?root=projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects?root=ex_projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects/config?path=<projectPath>` and `/api/microservices/met-nonlinear/proxy/api/images` must return the D601 TS backend health, queue/GPU policy, full project tree inputs, structured project detail and ready `met-nonlinear-ml:tf26` image status.
|
- User services: internal `/api/microservices` must include `todo-note` and `oa-event-flow` on `main-server`, canonical `filebrowser` on `D518`, plus `k3sctl-adapter`, `code-queue`, `findjob`, `pipeline`, `met-nonlinear`, `claudeqq` and `filebrowser-d601` on `D601` with `public=false`; `/api/microservices/todo-note/health` must report `storage=postgres`, `/api/microservices/todo-note/proxy/api/instances` must expose the migrated Todo Note lists, and a temporary Todo Note list create/add/toggle/undo/delete cycle must succeed through the real provider-gateway proxy; `/api/microservices/oa-event-flow/health`, `/api/microservices/oa-event-flow/proxy/api/diagnostics`, `/api/microservices/oa-event-flow/proxy/api/events`, `/api/microservices/oa-event-flow/proxy/api/events?tags=service:pipeline` and `/api/microservices/oa-event-flow/proxy/api/stats/trace` must prove the independent OA event table、Pipeline bridge 和 stats center are reachable through UniDesk proxy; `/api/microservices/k3sctl-adapter/health` and `/api/microservices/k3sctl-adapter/proxy/api/control-plane` must expose the D601 `unidesk-k3s` control plane, `kubeApiProxy.mode=kubernetes-api-service-proxy`, D601 active Code Queue instance `servingHealthy=true`, `presentNodeIds` containing `D601`, `missingNodeIds=[]`, `status=healthy`, and `noFallback=true`; `/api/microservices/code-queue/health` must return the active Code Queue backend summary with default model `gpt-5.5`, `egressProxy.connected=true`, and `/api/microservices/code-queue/proxy/api/tasks/overview` must return queue state through backend-core -> k3sctl-adapter -> Kubernetes API service proxy -> k3s/k8s Service, not through a `serviceId=code-queue` provider-gateway direct task or `/api/code-queue-direct`; `/api/microservices/filebrowser/health`, `/api/microservices/filebrowser-d601/health` and `/api/microservices/filebrowser/proxy/` must prove File Browser health and WebUI access through UniDesk proxy; `/api/microservices/findjob/health` and `/api/microservices/findjob/proxy/api/summary` must succeed through the real provider-gateway proxy; `/api/microservices/findjob/proxy/api/jobs?__unideskArrayLimit=jobs:5` must return a bounded preview with `_unidesk.arrayLimits` metadata; `/api/microservices/pipeline/health`, `/api/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3` and `/api/microservices/pipeline/proxy/api/oa-event-flow/diagnostics` must return Pipeline health, registry/run previews and OA event-flow evidence; `/api/microservices/met-nonlinear/health`, `/api/microservices/met-nonlinear/proxy/api/queue`, `/api/microservices/met-nonlinear/proxy/api/projects?root=projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects?root=ex_projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects/config?path=<projectPath>` and `/api/microservices/met-nonlinear/proxy/api/images` must return the D601 TS backend health, queue/GPU policy, full project tree inputs, structured project detail and ready `met-nonlinear-ml:tf26` image status.
|
||||||
- ClaudeQQ availability: `/api/microservices/claudeqq/health` must only pass when `ready=true`, NapCat HTTP and WebSocket are connected, and `napcat.loginState=logged_in`; `/api/microservices/claudeqq/proxy/api/napcat/login` must show the same logged-in account state and `/api/microservices/claudeqq/proxy/api/events/recent` must prove the backend can read the persistent event cache. A QR-code-only or not-logged-in NapCat state must be treated as unhealthy.
|
- ClaudeQQ availability: `/api/microservices/claudeqq/health` must only pass when `ready=true`, NapCat HTTP and WebSocket are connected, and `napcat.loginState=logged_in`; `/api/microservices/claudeqq/proxy/api/napcat/login` must show the same logged-in account state and `/api/microservices/claudeqq/proxy/api/events/recent` must prove the backend can read the persistent event cache. A QR-code-only or not-logged-in NapCat state must be treated as unhealthy.
|
||||||
- Database: the command writes an `unidesk_e2e_markers` row through `docker exec unidesk-database psql`, confirms provider state is stored in PostgreSQL, and checks Todo Note rows exist in `todo_note_instances` using the same named volume.
|
- Database: the command writes an `unidesk_e2e_markers` row through `docker exec unidesk-database psql`, confirms provider state is stored in PostgreSQL, and checks Todo Note rows exist in `todo_note_instances` using the same named volume.
|
||||||
- Pipeline OA event flow: `microservice:pipeline-oa-event-flow` must prove both no-audit and monitor-audit runs are driven by OA events end to end. The event stream must show `node-finished` as a neutral fact with `pipeline:{pipelineId}` and `epoch:{runId}` tags, OA policy as the source of downstream/audit decisions, monitor decisions as OA control events, and runner control-result evidence. E2E must fail if delivery still depends on a legacy detail audit policy flag as policy authority, independent legacy audit-request points, a legacy batch completion gate, direct monitor-to-runner calls, or frontend/CLI writes to Pipeline `.state`.
|
- Pipeline OA event flow: `microservice:pipeline-oa-event-flow` must prove both no-audit and monitor-audit runs are driven by OA events end to end. The event stream must show `node-finished` as a neutral fact with `pipeline:{pipelineId}` and `epoch:{runId}` tags, OA policy as the source of downstream/audit decisions, monitor decisions as OA control events, and runner control-result evidence. E2E must fail if delivery still depends on a legacy detail audit policy flag as policy authority, independent legacy audit-request points, a legacy batch completion gate, direct monitor-to-runner calls, or frontend/CLI writes to Pipeline `.state`.
|
||||||
- The same Pipeline OA diagnostics must fail on legacy file-transport residuals. Procedure containers, monitor sessions, UI/Gantt DTO builders and CLI fetches must consume prompt/control/stop/display evidence only from the OA event ledger and normalized HTTP read APIs; `control-prompts.jsonl`, `monitor-prompts.jsonl`, `monitor-control`, `control-events.jsonl`, monitor stop files, `.state/pipeline-runs/{runId}/control/commands/`, `PIPELINE_*_APPEND_FILE`, local JSONL append/read helpers, and monitor `/pipeline-state` mounts are forbidden in runtime source.
|
- The same Pipeline OA diagnostics must fail on legacy file-transport residuals. Procedure containers, monitor sessions, UI/Gantt DTO builders and CLI fetches must consume prompt/control/stop/display evidence only from the OA event ledger and normalized HTTP read APIs; `control-prompts.jsonl`, `monitor-prompts.jsonl`, `monitor-control`, `control-events.jsonl`, monitor stop files, `.state/pipeline-runs/{runId}/control/commands/`, `PIPELINE_*_APPEND_FILE`, local JSONL append/read helpers, and monitor `/pipeline-state` mounts are forbidden in runtime source.
|
||||||
- Pipeline live Gantt setup: when `frontend:pipeline-gantt-observation-live-running` is selected, E2E first looks for a current Pipeline run that already contains both a `node-long-running-observation` marker and a still-running execution interval. If no such candidate exists, the E2E setup starts the D601 `monitor-management-behavior-test` pipeline through `bun scripts/cli.ts ssh D601 ...` and polls the private backend proxy until the observation candidate exists; the acceptance assertion itself still opens the public frontend with Playwright and verifies the rendered arrows, absence of observation source pseudo-points, target arrow inset, and live flashing running bar through React DOM controls.
|
- Pipeline live Gantt setup: when `frontend:pipeline-gantt-observation-live-running` is selected, E2E first looks for a current Pipeline run that already contains both a `node-long-running-observation` marker and a still-running execution interval. If no such candidate exists, the E2E setup starts the D601 `monitor-management-behavior-test` pipeline through `bun scripts/cli.ts ssh D601 ...` and polls the private backend proxy until the observation candidate exists; the acceptance assertion itself still opens the public frontend with Playwright and verifies the rendered arrows, absence of observation source pseudo-points, target arrow inset, and live flashing running bar through React DOM controls.
|
||||||
- Frontend: Playwright must open the public frontend URL derived from `network.publicHost`, not localhost or a Docker-internal URL; it logs in with the configured account, waits for `核心在线`, asserts that `main-server` and `Main Server Provider` are visible, verifies desktop sidebar collapse and `PGDATA` overview metric, opens `运行总览 / 性能面板` to verify `Bwebui`、组件汇总、最近失败请求、内部操作汇总和最近慢操作, clicks `查看原始JSON` to verify Provider data from the frontend, confirms no raw JSON is visible before that click, opens task history to verify duration and failure diagnostics, opens resource nodes `资源监控` to verify CPU/Memory/Disk curves, the structured process resource table, default memory-desc sorting, sortable CPU column and provider upgrade precheck dispatch, opens `Docker 状态`, switches to `main-server`, and verifies the Docker Desktop-style container view including the database named volume `unidesk_pgdata_10gb`, opens `网关版本` and verifies the provider-gateway version, SSH 透传可用性、远程更新可用性 plus structured remote update records for `provider.upgrade`, then opens `用户服务 / 服务目录`、`用户服务 / Todo Note`、`用户服务 / OA Event Flow`、`用户服务 / k3s Control`、`用户服务 / Code Queue`、`用户服务 / FindJob`、`用户服务 / Pipeline` and `用户服务 / MET Nonlinear` to verify 主 server Todo Note/OA Event Flow、D601 Code Queue、D601 业务服务、仓库引用、私有后端映射、Todo Note 迁移清单和树形任务、OA Event Flow 事件表和 Trace stats 表、k3s 控制面/D601-D518 实例/Kubernetes API service proxy/no-fallback 路径、Code Queue 队列/模型/输出/初始 `Submitted prompt`/终态任务自动加载完整 Trace/追加 prompt/打断控件、FindJob 指标和岗位预览、Pipeline 组件矩阵、MiniMax 限额卡片、结构化 OA 事件流诊断面板、React Flow 控制图、epoch 甘特图、甘特图渲染图导出、monitor 首列排序、长任务观察连线、无观察来源伪点、running node 实时闪动执行条和 OpenCode Trace、MET Nonlinear 项目库/Fork/待启动队列/当前队列/已完成/失败诊断/GPU/镜像都通过 React 控件展示。Playwright 还必须验证 Code Queue 页面所有 API 请求走 `/api/microservices/code-queue/proxy`,不得再出现 `/api/code-queue-direct`;深链接直达路由例如公网 `http://<publicHost>:<frontendPort>/app/pipeline/` 能直接落到 Pipeline 页面,随后切到 `资源节点 / Docker 状态` 时地址栏更新为 `/nodes/docker/`,并且浏览器 history 返回链路仍能回到 `/app/pipeline/`;还必须直开 `/app/code-queue/` 验证页面存在 `app-shell`、左侧主模块边栏、顶部状态栏、顶部子标签和 `code-queue-page`,防止用户服务 deep link 退化成缺 shell 的 standalone 页面;同时 `态势总览` 这类非用户服务页面应落在自己的模块前缀下,例如 `/ops/status/`。Playwright 必须覆盖默认可见时间按北京时间显示,至少包括顶部 `北京时间` 时钟、任务历史/网关版本更新时间和用户服务刷新时间,不得随浏览器本地时区漂移。Task history and provider upgrade records must not display a real sub-second duration as `0s`; MET Nonlinear running rows must show an ETA derived from backend progress or from `startedAt` plus epoch progress, and queue/completed rows must show training speed as `epoch/h`.
|
- Frontend: Playwright must open the public frontend URL derived from `network.publicHost`, not localhost or a Docker-internal URL; it logs in with the configured account, waits for `核心在线`, asserts that `main-server` and `Main Server Provider` are visible, verifies desktop sidebar collapse and `PGDATA` overview metric, opens `运行总览 / 性能面板` to verify `Bwebui`、组件汇总、最近失败请求、内部操作汇总和最近慢操作, clicks `查看原始JSON` to verify Provider data from the frontend, confirms no raw JSON is visible before that click, opens task history to verify duration and failure diagnostics, opens resource nodes `资源监控` to verify CPU/Memory/Disk curves, the structured process resource table, default memory-desc sorting, sortable CPU column and provider upgrade precheck dispatch, opens `Docker 状态`, switches to `main-server`, and verifies the Docker Desktop-style container view including the database named volume `unidesk_pgdata_10gb`, opens `网关版本` and verifies the provider-gateway version, SSH 透传可用性、远程更新可用性 plus structured remote update records for `provider.upgrade`, then opens `用户服务 / 服务目录`、`用户服务 / Todo Note`、`用户服务 / OA Event Flow`、`用户服务 / k3s Control`、`用户服务 / Code Queue`、`用户服务 / FindJob`、`用户服务 / Pipeline` and `用户服务 / MET Nonlinear` to verify 主 server Todo Note/OA Event Flow、D601 Code Queue、D601 业务服务、仓库引用、私有后端映射、Todo Note 迁移清单和树形任务、OA Event Flow 事件表和 Trace stats 表、k3s 控制面/D601 scheduler/read/write 实例/Kubernetes API service proxy/no-fallback 路径、Code Queue 队列/模型/输出/初始 `Submitted prompt`/终态任务自动加载完整 Trace/追加 prompt/打断控件、FindJob 指标和岗位预览、Pipeline 组件矩阵、MiniMax 限额卡片、结构化 OA 事件流诊断面板、React Flow 控制图、epoch 甘特图、甘特图渲染图导出、monitor 首列排序、长任务观察连线、无观察来源伪点、running node 实时闪动执行条和 OpenCode Trace、MET Nonlinear 项目库/Fork/待启动队列/当前队列/已完成/失败诊断/GPU/镜像都通过 React 控件展示。Playwright 还必须验证 Code Queue 页面所有 API 请求走 `/api/microservices/code-queue/proxy`,不得再出现 `/api/code-queue-direct`;深链接直达路由例如公网 `http://<publicHost>:<frontendPort>/app/pipeline/` 能直接落到 Pipeline 页面,随后切到 `资源节点 / Docker 状态` 时地址栏更新为 `/nodes/docker/`,并且浏览器 history 返回链路仍能回到 `/app/pipeline/`;还必须直开 `/app/code-queue/` 验证页面存在 `app-shell`、左侧主模块边栏、顶部状态栏、顶部子标签和 `code-queue-page`,防止用户服务 deep link 退化成缺 shell 的 standalone 页面;同时 `态势总览` 这类非用户服务页面应落在自己的模块前缀下,例如 `/ops/status/`。Playwright 必须覆盖默认可见时间按北京时间显示,至少包括顶部 `北京时间` 时钟、任务历史/网关版本更新时间和用户服务刷新时间,不得随浏览器本地时区漂移。Task history and provider upgrade records must not display a real sub-second duration as `0s`; MET Nonlinear running rows must show an ETA derived from backend progress or from `startedAt` plus epoch progress, and queue/completed rows must show training speed as `epoch/h`.
|
||||||
- Frontend dense-layout regression gate: whenever a frontend change touches Pipeline 右侧边栏、Trace timeline、详情抽屉、甘特图坐标或其他高信息密度面板, Playwright acceptance must inspect both `总高度` and `横向滚动条`. For Pipeline specifically, the OpenCode Trace session head must carry shared agent/model/session facts and the Trace body must use the same Code Queue `TraceView` styling; Playwright must fail if old `.pipeline-opencode-step`, `.pipeline-opencode-flow`, `.pipeline-step-message-card` or `.pipeline-opencode-part` user-visible styles reappear, if the Trace container introduces an internal horizontal scrollbar, or if `frontend:pipeline-gantt-frontend-y-accuracy` fails to prove the frontend `frontend-y` layout maps ticks, markers and execution bars from timestamps to y coordinates within tolerance.
|
- Frontend dense-layout regression gate: whenever a frontend change touches Pipeline 右侧边栏、Trace timeline、详情抽屉、甘特图坐标或其他高信息密度面板, Playwright acceptance must inspect both `总高度` and `横向滚动条`. For Pipeline specifically, the OpenCode Trace session head must carry shared agent/model/session facts and the Trace body must use the same Code Queue `TraceView` styling; Playwright must fail if old `.pipeline-opencode-step`, `.pipeline-opencode-flow`, `.pipeline-step-message-card` or `.pipeline-opencode-part` user-visible styles reappear, if the Trace container introduces an internal horizontal scrollbar, or if `frontend:pipeline-gantt-frontend-y-accuracy` fails to prove the frontend `frontend-y` layout maps ticks, markers and execution bars from timestamps to y coordinates within tolerance.
|
||||||
- OpenCode Trace must use Code Queue Trace styling and must not render the deprecated Pipeline continuous step connector; Playwright should fail if `.pipeline-opencode-flow`, `.pipeline-opencode-step` or any equivalent continuous connector/card returns to the user-visible Trace.
|
- OpenCode Trace must use Code Queue Trace styling and must not render the deprecated Pipeline continuous step connector; Playwright should fail if `.pipeline-opencode-flow`, `.pipeline-opencode-step` or any equivalent continuous connector/card returns to the user-visible Trace.
|
||||||
- User service frontend assertions must wait for real backend data, not only the page skeleton. For Todo Note this means the page must show the migrated lists `CONSTAR`、`大论文`、`找工作`、`小论文`、`事务`, support creating a temporary list and task through the frontend, and delete that temporary list afterwards. The temporary list must be selected again by its unique generated name before deletion so E2E never deletes a migrated source list by accident. For FindJob this means the page must show a numeric `岗位总量`, `HEALTH OK`, and a non-empty `PREVIEW` count such as `40/1463 PREVIEW`; for Pipeline this means the page must show `Pipeline v2 工作台`, `Health OK`, a numeric component count, a non-empty React Flow control graph, `控制图`, `Epoch 甘特图`, and after clicking a Gantt execution line it must show `OpenCode Trace` rendered by the shared Code Queue-style Trace component with messages and tool-call groups; for MET Nonlinear this means the page must show `MET Nonlinear 训练编排`, `Health OK`, `Fork Project`, `加入待启动队列`, `启动队列`, `当前队列`, 最大并发设置、task queue and GPU/image panels, and must not show the removed hard-coded `创建10个10轮任务` frontend entry. The MET Nonlinear project library must render `projects/` and `ex_projects/` as a true path tree with folder Project counts; clicking a project row must open a structured detail panel containing `config.json`, `data/ 训练状态`, `模型参数`, `指标` and a parameter count such as `Total Params`; clicking a completed/current/failed job row must open a structured job detail and both the row and detail must show `epoch/h`. Full MET Nonlinear acceptance is driven by public frontend controls: choose a visible source Project, set batch size, epochs and max concurrency in inputs, fork into `projects/unidesk_forks/`, stage the selected forks, start the queue, and verify completed rows plus automatic `metnl-train-*` container removal; loading placeholders like `--` or empty states are not sufficient for E2E success.
|
- User service frontend assertions must wait for real backend data, not only the page skeleton. For Todo Note this means the page must show the migrated lists `CONSTAR`、`大论文`、`找工作`、`小论文`、`事务`, support creating a temporary list and task through the frontend, and delete that temporary list afterwards. The temporary list must be selected again by its unique generated name before deletion so E2E never deletes a migrated source list by accident. For FindJob this means the page must show a numeric `岗位总量`, `HEALTH OK`, and a non-empty `PREVIEW` count such as `40/1463 PREVIEW`; for Pipeline this means the page must show `Pipeline v2 工作台`, `Health OK`, a numeric component count, a non-empty React Flow control graph, `控制图`, `Epoch 甘特图`, and after clicking a Gantt execution line it must show `OpenCode Trace` rendered by the shared Code Queue-style Trace component with messages and tool-call groups; for MET Nonlinear this means the page must show `MET Nonlinear 训练编排`, `Health OK`, `Fork Project`, `加入待启动队列`, `启动队列`, `当前队列`, 最大并发设置、task queue and GPU/image panels, and must not show the removed hard-coded `创建10个10轮任务` frontend entry. The MET Nonlinear project library must render `projects/` and `ex_projects/` as a true path tree with folder Project counts; clicking a project row must open a structured detail panel containing `config.json`, `data/ 训练状态`, `模型参数`, `指标` and a parameter count such as `Total Params`; clicking a completed/current/failed job row must open a structured job detail and both the row and detail must show `epoch/h`. Full MET Nonlinear acceptance is driven by public frontend controls: choose a visible source Project, set batch size, epochs and max concurrency in inputs, fork into `projects/unidesk_forks/`, stage the selected forks, start the queue, and verify completed rows plus automatic `metnl-train-*` container removal; loading placeholders like `--` or empty states are not sufficient for E2E success.
|
||||||
|
|||||||
@@ -95,7 +95,7 @@ frontend shell 必须把左侧主模块与顶部子标签编译为统一的 URL
|
|||||||
- `ClaudeQQ` 子标签必须把 D601 ClaudeQQ 后端渲染为 UniDesk React 控件,包括 NapCat 容器登录二维码、NapCat HTTP/WS 状态、事件缓存、QQ 事件订阅表、订阅创建表单、消息推送表单、主用户私聊账号 `645275593` 标记、最近 QQ 事件、已发送记录和显式原始 JSON 按钮。
|
- `ClaudeQQ` 子标签必须把 D601 ClaudeQQ 后端渲染为 UniDesk React 控件,包括 NapCat 容器登录二维码、NapCat HTTP/WS 状态、事件缓存、QQ 事件订阅表、订阅创建表单、消息推送表单、主用户私聊账号 `645275593` 标记、最近 QQ 事件、已发送记录和显式原始 JSON 按钮。
|
||||||
- `Baidu Netdisk` 子标签必须把主 server `baidu-netdisk-backend` 后端渲染为 UniDesk React 控件,包括 OAuth 设备码二维码/用户码登录、账号容量、配置工作根文件浏览(当前默认百度网盘根目录 `/`)、staging 目录上传/下载任务、上传/下载自测按钮与 MD5 结果、脱敏安全说明、日志摘要和显式原始 JSON 按钮;不得把 access token、refresh token、dlink 或 staging 文件字节流裸露到浏览器。
|
- `Baidu Netdisk` 子标签必须把主 server `baidu-netdisk-backend` 后端渲染为 UniDesk React 控件,包括 OAuth 设备码二维码/用户码登录、账号容量、配置工作根文件浏览(当前默认百度网盘根目录 `/`)、staging 目录上传/下载任务、上传/下载自测按钮与 MD5 结果、脱敏安全说明、日志摘要和显式原始 JSON 按钮;不得把 access token、refresh token、dlink 或 staging 文件字节流裸露到浏览器。
|
||||||
- `OA Event Flow` 子标签必须把主 server `oa-event-flow-backend` 后端渲染为 UniDesk React 控件,包括服务健康、事件表、tag 过滤、SSE live 状态、Trace/STEP stats 表、Code Queue/Pipeline 标签入口和显式原始 JSON 按钮;默认页面不得裸铺完整事件 JSON,事件表只展示结构化摘要,完整 envelope/payload 只能通过 `查看原始JSON` 打开。
|
- `OA Event Flow` 子标签必须把主 server `oa-event-flow-backend` 后端渲染为 UniDesk React 控件,包括服务健康、事件表、tag 过滤、SSE live 状态、Trace/STEP stats 表、Code Queue/Pipeline 标签入口和显式原始 JSON 按钮;默认页面不得裸铺完整事件 JSON,事件表只展示结构化摘要,完整 envelope/payload 只能通过 `查看原始JSON` 打开。
|
||||||
- `k3s Control` 子标签必须把 D601 `k3sctl-adapter` 控制面渲染为 UniDesk React 控件,包括 control plane 状态、manifest 列表、D601/D518 节点实例、active instance、single-writer/no-fallback 路径、Kubernetes API service proxy 状态、kubectl/k3s snapshot 摘要和显式原始 JSON 按钮;页面只能通过 `/api/microservices/k3sctl-adapter/proxy/api/control-plane` 取数,不得直接访问 provider-gateway、NodePort、业务容器端口或裸 k3s/kubectl API。
|
- `k3s Control` 子标签必须把 D601 `k3sctl-adapter` 控制面渲染为 UniDesk React 控件,包括 control plane 状态、manifest 列表、D601 scheduler/read/write 实例、active instance、single-writer/no-fallback 路径、Kubernetes API service proxy 状态、kubectl/k3s snapshot 摘要和显式原始 JSON 按钮;页面只能通过 `/api/microservices/k3sctl-adapter/proxy/api/control-plane` 取数,不得直接访问 provider-gateway、NodePort、业务容器端口或裸 k3s/kubectl API。
|
||||||
- `Code Queue` 子标签必须把 D601 k3s/k8s `code-queue` Service 渲染为 UniDesk React 控件,前端 API 基址只能是 `/api/microservices/code-queue/proxy`,不能继续使用旧 `/api/code-queue-direct` 别名;页面包括多 queue lane、queue 内串行、queue 间并行、queue 合并(点击“合并 queue”后必须用公共 `UniDeskDialog` 打开独立小窗口,用下拉菜单选择源 queue;不得把源 queue 选择控件塞进正常提交任务的 Queue 选择区;合并后自动删除源 queue,只保留合并后的目标 queue,目标 queue 按原 queueEnteredAt/createdAt 时间顺序串行)、任务 ID/复制任务 ID、引用按钮、任务耗时、任务提交/批量提交、引用任务 ID、创建成功提示、清空输入、模型下拉、执行 Provider 下拉、执行模式下拉(默认容器/本机或 `windows-native`)、显式入队份数、默认模型 `gpt-5.5`、MiniMax judge 状态、Codex CLI-like 输出流、attempt 终态、运行中追加 prompt、打断、手动重试和显式原始 JSON 按钮;`windows-native` 模式必须在任务 JSON、卡片和 Trace 头部显示,并要求非本机 WSL Provider 与 `/mnt/<drive>` 工作目录;Codex CLI-like 输出流必须始终保留任务的初始 `Submitted prompt` 和运行中 `Steer prompt`;整个 agent loop 消息流统一命名为专有名词 `Trace`,`Trace` 包含 assistant message、user prompt、system event 和 tool call,但非错误 system event 默认只保留在原始输出/数据库中,不在 TraceView 展示;Code Queue 与 Pipeline/OpenCode messages 必须共用 `src/components/frontend/src/trace.tsx` 的 Trace 公共组件、统一 Trace item 接口和 codex/opencode port 适配层;连续 read/edit/run 工具调用只是在 Trace 内折叠为可展开工具调用组,汇总格式至少包含 `xx read, xx edit, xx run`,并展示读取文件、编辑文件、运行命令和耗时摘要;最近 3 个工具调用保持展开,工具调用内容不得自动换行且必须在工具调用块内部横向滚动,工具调用组展开后不得再增加额外左侧缩进;message 与 prompt 必须自动换行,普通 message 不显示左侧项目符号缩进且永不折叠;Trace 首屏可以是摘要预览,但终态任务被选中后必须自动在后台加载完整 Trace,手动“加载完整 Trace”也必须从 Code Queue output archive 分页补齐早期 trace,不得把 preview 的 `hasMore=false` 当成完整历史;即使热状态为控制体积裁剪了早期 raw output,也要从结构化 `basePrompt/displayPrompt/promptHistory` 和 archive 合成完整用户输入与 agent trace,并且初始 prompt 默认显示注入前 prompt 而不是引用注入全文;当初始 prompt 含引用注入时,引用内容必须默认折叠,并只在 Trace 的初始消息中提供可展开的“最终传入 Codex 的真实完整 prompt”,不得再渲染独立 Prompt 全量卡片;多轮引用注入必须按上游/最早上下文在前、直接引用在后的顺序排列,每一轮必须有明确 `Reference Round N/M` 分割线和时间范围,不能用固定 6 轮截断引用链;点击队列引用按钮必须自动把该任务 ID 写入提交表单的引用输入框,引用任务 ID 创建新任务时必须自动注入 `bun scripts/cli.ts codex task <taskId>` 的提示;连续执行同一 prompt 应通过入队份数一次性生成多条任务,避免快速连点造成操作员误判。
|
- `Code Queue` 子标签必须把 D601 k3s/k8s `code-queue` Service 渲染为 UniDesk React 控件,前端 API 基址只能是 `/api/microservices/code-queue/proxy`,不能继续使用旧 `/api/code-queue-direct` 别名;页面包括多 queue lane、queue 内串行、queue 间并行、queue 合并(点击“合并 queue”后必须用公共 `UniDeskDialog` 打开独立小窗口,用下拉菜单选择源 queue;不得把源 queue 选择控件塞进正常提交任务的 Queue 选择区;合并后自动删除源 queue,只保留合并后的目标 queue,目标 queue 按原 queueEnteredAt/createdAt 时间顺序串行)、任务 ID/复制任务 ID、引用按钮、任务耗时、任务提交/批量提交、引用任务 ID、创建成功提示、清空输入、模型下拉、执行 Provider 下拉、执行模式下拉(默认容器/本机或 `windows-native`)、显式入队份数、默认模型 `gpt-5.5`、MiniMax judge 状态、Codex CLI-like 输出流、attempt 终态、运行中追加 prompt、打断、手动重试和显式原始 JSON 按钮;`windows-native` 模式必须在任务 JSON、卡片和 Trace 头部显示,并要求非本机 WSL Provider 与 `/mnt/<drive>` 工作目录;Codex CLI-like 输出流必须始终保留任务的初始 `Submitted prompt` 和运行中 `Steer prompt`;整个 agent loop 消息流统一命名为专有名词 `Trace`,`Trace` 包含 assistant message、user prompt、system event 和 tool call,但非错误 system event 默认只保留在原始输出/数据库中,不在 TraceView 展示;Code Queue 与 Pipeline/OpenCode messages 必须共用 `src/components/frontend/src/trace.tsx` 的 Trace 公共组件、统一 Trace item 接口和 codex/opencode port 适配层;连续 read/edit/run 工具调用只是在 Trace 内折叠为可展开工具调用组,汇总格式至少包含 `xx read, xx edit, xx run`,并展示读取文件、编辑文件、运行命令和耗时摘要;最近 3 个工具调用保持展开,工具调用内容不得自动换行且必须在工具调用块内部横向滚动,工具调用组展开后不得再增加额外左侧缩进;message 与 prompt 必须自动换行,普通 message 不显示左侧项目符号缩进且永不折叠;Trace 首屏可以是摘要预览,但终态任务被选中后必须自动在后台加载完整 Trace,手动“加载完整 Trace”也必须从 Code Queue output archive 分页补齐早期 trace,不得把 preview 的 `hasMore=false` 当成完整历史;即使热状态为控制体积裁剪了早期 raw output,也要从结构化 `basePrompt/displayPrompt/promptHistory` 和 archive 合成完整用户输入与 agent trace,并且初始 prompt 默认显示注入前 prompt 而不是引用注入全文;当初始 prompt 含引用注入时,引用内容必须默认折叠,并只在 Trace 的初始消息中提供可展开的“最终传入 Codex 的真实完整 prompt”,不得再渲染独立 Prompt 全量卡片;多轮引用注入必须按上游/最早上下文在前、直接引用在后的顺序排列,每一轮必须有明确 `Reference Round N/M` 分割线和时间范围,不能用固定 6 轮截断引用链;点击队列引用按钮必须自动把该任务 ID 写入提交表单的引用输入框,引用任务 ID 创建新任务时必须自动注入 `bun scripts/cli.ts codex task <taskId>` 的提示;连续执行同一 prompt 应通过入队份数一次性生成多条任务,避免快速连点造成操作员误判。
|
||||||
- `MDTODO` 子标签必须把 D601 k3s `mdtodo` Service 渲染为 UniDesk React 控件,前端 API 基址只能是 `/api/microservices/mdtodo/proxy`;页面包括 TODO Markdown 文件列表、任务树、状态徽标、标题与正文编辑、新增根任务/子任务、删除任务、执行命令生成、hostPath 健康摘要和显式原始 JSON 按钮,不得 iframe 原 VS Code webview、公开 VSIX 旧前端或把完整 Markdown/JSON 默认铺在页面上。
|
- `MDTODO` 子标签必须把 D601 k3s `mdtodo` Service 渲染为 UniDesk React 控件,前端 API 基址只能是 `/api/microservices/mdtodo/proxy`;页面包括 TODO Markdown 文件列表、任务树、状态徽标、标题与正文编辑、新增根任务/子任务、删除任务、执行命令生成、hostPath 健康摘要和显式原始 JSON 按钮,不得 iframe 原 VS Code webview、公开 VSIX 旧前端或把完整 Markdown/JSON 默认铺在页面上。
|
||||||
- `Code Queue` 前端改进必须在同一任务内重建并上线公网 frontend,不能只修改源码或本地 bundle;重建 frontend 是无状态 WebUI 替换,不会导致 Code Queue 长期任务失败。已结束未读任务只能在 task card 边角显示类似未读消息的 `codex-unread-badge` 圆点和“标为已读”操作,不得把整张卡片改成红色/琥珀色失败态边框、背景或胶囊标签;状态栏的“结束未读”提示也不得使用失败态红色。
|
- `Code Queue` 前端改进必须在同一任务内重建并上线公网 frontend,不能只修改源码或本地 bundle;重建 frontend 是无状态 WebUI 替换,不会导致 Code Queue 长期任务失败。已结束未读任务只能在 task card 边角显示类似未读消息的 `codex-unread-badge` 圆点和“标为已读”操作,不得把整张卡片改成红色/琥珀色失败态边框、背景或胶囊标签;状态栏的“结束未读”提示也不得使用失败态红色。
|
||||||
|
|||||||
@@ -134,12 +134,12 @@ Baidu Netdisk 在 UniDesk 语境中按纯后端服务管理:不得暴露百度
|
|||||||
- 原生 API 连接:D601 原生 k3s 的 kubeconfig 固定来自宿主 `/etc/rancher/k3s/k3s.yaml`,adapter 内部挂载为 `/var/lib/unidesk/k3s/kubeconfig`;当 kubeconfig server 是 `127.0.0.1:6443` 时,adapter 容器必须通过受控 SSH local tunnel 把容器内 `127.0.0.1:6443` 转发到 WSL host `127.0.0.1:6443`,并设置 `K3SCTL_KUBE_API_CONNECT_HOST=127.0.0.1`。不得依赖 Docker Desktop 的 `network_mode: host`,因为它进入的是 Docker Desktop VM 网络而不是 D601 WSL Ubuntu 网络;也不得依赖 `host.docker.internal:6443`、旧 `rancher/k3s` 容器 IP、NodePort 或手工 service endpoint。
|
- 原生 API 连接:D601 原生 k3s 的 kubeconfig 固定来自宿主 `/etc/rancher/k3s/k3s.yaml`,adapter 内部挂载为 `/var/lib/unidesk/k3s/kubeconfig`;当 kubeconfig server 是 `127.0.0.1:6443` 时,adapter 容器必须通过受控 SSH local tunnel 把容器内 `127.0.0.1:6443` 转发到 WSL host `127.0.0.1:6443`,并设置 `K3SCTL_KUBE_API_CONNECT_HOST=127.0.0.1`。不得依赖 Docker Desktop 的 `network_mode: host`,因为它进入的是 Docker Desktop VM 网络而不是 D601 WSL Ubuntu 网络;也不得依赖 `host.docker.internal:6443`、旧 `rancher/k3s` 容器 IP、NodePort 或手工 service endpoint。
|
||||||
- k3s 实现:D601 控制面、D518 或其他计算资源节点上的 k3s agent/worker 都必须原生安装在节点 host OS 或 WSL 发行版内,以 `/usr/local/bin/k3s` 和 systemd `k3s.service`/`k3s-agent.service` 运行;不得用 Docker、Compose、`rancher/k3s` 长驻容器、kind/k3d 或其他容器化方式承载 k3s 控制面或 kubelet。Docker 只允许用于 provider-gateway、业务容器镜像构建、运行用户 workload 或临时提取 k3s 二进制/镜像 artifact,不能成为 k3s runtime 边界。验收时必须证明 `systemctl is-active k3s` 或 agent 服务正常、`kubectl get nodes -o wide` 看到真实节点 OS/内核、k3s containerd socket 位于 `/run/k3s/containerd/containerd.sock`,且不存在 active `rancher/k3s` 控制面容器。
|
- k3s 实现:D601 控制面、D518 或其他计算资源节点上的 k3s agent/worker 都必须原生安装在节点 host OS 或 WSL 发行版内,以 `/usr/local/bin/k3s` 和 systemd `k3s.service`/`k3s-agent.service` 运行;不得用 Docker、Compose、`rancher/k3s` 长驻容器、kind/k3d 或其他容器化方式承载 k3s 控制面或 kubelet。Docker 只允许用于 provider-gateway、业务容器镜像构建、运行用户 workload 或临时提取 k3s 二进制/镜像 artifact,不能成为 k3s runtime 边界。验收时必须证明 `systemctl is-active k3s` 或 agent 服务正常、`kubectl get nodes -o wide` 看到真实节点 OS/内核、k3s containerd socket 位于 `/run/k3s/containerd/containerd.sock`,且不存在 active `rancher/k3s` 控制面容器。
|
||||||
- k3s 路由对象:`k3sctl-managed` 可以落到 k3s、k8s 或等价标准 Kubernetes 控制面,但必须使用 Kubernetes 原生命名空间、Deployment、Service、readiness/liveness probe、Kubernetes API service proxy 等规范对象;不得把裸容器端口、NodePort、SSH curl、provider-gateway `microservice.http` 或 host 直连地址伪装成 k3s 服务路由。WSL 节点的 hostPath 和 local-path 语义必须解析到 WSL host 文件系统;例如 D601 Code Queue Pod 的 `/workspace` 必须映射 WSL `/home/ubuntu`,不能映射到容器化 k3s 内部的 `/home/ubuntu`。
|
- k3s 路由对象:`k3sctl-managed` 可以落到 k3s、k8s 或等价标准 Kubernetes 控制面,但必须使用 Kubernetes 原生命名空间、Deployment、Service、readiness/liveness probe、Kubernetes API service proxy 等规范对象;不得把裸容器端口、NodePort、SSH curl、provider-gateway `microservice.http` 或 host 直连地址伪装成 k3s 服务路由。WSL 节点的 hostPath 和 local-path 语义必须解析到 WSL host 文件系统;例如 D601 Code Queue Pod 的 `/workspace` 必须映射 WSL `/home/ubuntu`,不能映射到容器化 k3s 内部的 `/home/ubuntu`。
|
||||||
- k3s 系统组件:D601 原生 k3s server 必须禁用非必要的 `traefik`、`servicelb` 和 `metrics-server`,只保留业务必需的 API server、CoreDNS 与 local-path provisioner;CoreDNS 和 local-path provisioner 固定运行在 D601 控制面节点,避免 D518 维护隧道限制导致系统 DNS/readiness 抖动。
|
- k3s 系统组件:D601 原生 k3s server 必须禁用非必要的 `traefik`、`servicelb` 和 `metrics-server`,只保留业务必需的 API server、CoreDNS 与 local-path provisioner;CoreDNS 和 local-path provisioner 固定运行在 D601 控制面节点,避免跨 WSL/NAT 节点维护隧道限制导致系统 DNS/readiness 抖动。
|
||||||
- manifest:代管服务声明放在 `src/components/microservices/k3sctl-adapter/k3s/*.k3s.json`,adapter 启动时通过 `K3SCTL_MANIFEST_PATHS` 读取;manifest 是 D601/D518 实例、active instance、single writer、expected nodes 和 health policy 的权威来源。`K3SCTL_SERVICES_JSON` 不得承载 static HTTP 服务、不得覆盖同名服务、不得作为隐藏 fallback;如需追加服务也必须提供完整 `ManagedKubernetesService` manifest。
|
- manifest:代管服务声明放在 `src/components/microservices/k3sctl-adapter/k3s/*.k3s.json`,adapter 启动时通过 `K3SCTL_MANIFEST_PATHS` 读取;manifest 是当前计划内 k3s 实例、active instance、single writer、expected nodes 和 health policy 的权威来源。`K3SCTL_SERVICES_JSON` 不得承载 static HTTP 服务、不得覆盖同名服务、不得作为隐藏 fallback;如需追加服务也必须提供完整 `ManagedKubernetesService` manifest。
|
||||||
- API:`GET /health` 只表示 adapter 控制面自身可用,并把代管服务 serving 健康作为 `managedServicesHealthy` 字段展示;`GET /api/control-plane` 返回控制面、manifest、kubectl/k3s snapshot 和代管服务状态;`GET /api/services` 返回代管服务列表;`GET|HEAD /api/services/<id>/health` 返回该 k3s 服务的 active serving 健康;`/api/services/<id>/proxy/*` 是业务请求进入 active service 的唯一代理入口。
|
- API:`GET /health` 只表示 adapter 控制面自身可用,并把代管服务 serving 健康作为 `managedServicesHealthy` 字段展示;`GET /api/control-plane` 返回控制面、manifest、kubectl/k3s snapshot 和代管服务状态;`GET /api/services` 返回代管服务列表;`GET|HEAD /api/services/<id>/health` 返回该 k3s 服务的 active serving 健康;`/api/services/<id>/proxy/*` 是业务请求进入 active service 的唯一代理入口。
|
||||||
- 代理路径:adapter 访问 active 业务服务的唯一正式路径是 Kubernetes API service proxy:`/api/v1/namespaces/<namespace>/services/<service>:<port>/proxy/...`。D601 与 D518 不要求能彼此直连;D518 通过 k3s agent 加入控制面,控制面连接可以借助节点维护隧道建立,但业务请求不得退化为 provider-gateway 直连 Code Queue HTTP 端口。standby/worker 节点如果受 kubelet/service-proxy 可达性限制,可以在 manifest 中显式使用 `healthMode=pod-ready` 作为拓扑健康探针;这只读取 Kubernetes Pod readiness,不是业务代理路径,也不能替代 active Service proxy。
|
- 代理路径:adapter 访问 active 业务服务的唯一正式路径是 Kubernetes API service proxy:`/api/v1/namespaces/<namespace>/services/<service>:<port>/proxy/...`。当前 Code Queue 运行拓扑只把 D601 写入 `expectedNodeIds`;任何远端 standby/worker 节点必须先完成原生 k3s-agent、稳定控制面网络、镜像分发和 hostPath 语义验证,才能加入 manifest。业务请求不得退化为 provider-gateway 直连 Code Queue HTTP 端口。standby/worker 节点如果受 kubelet/service-proxy 可达性限制,可以在 manifest 中显式使用 `healthMode=pod-ready` 作为拓扑健康探针;这只读取 Kubernetes Pod readiness,不是业务代理路径,也不能替代 active Service proxy。
|
||||||
- 拓扑健康:`expectedNodeIds` 负责展示计划内节点;当前 Code Queue 目标拓扑必须同时包含 D601 和 D518,`presentNodeIds` 应为 `["D601","D518"]`、`missingNodeIds=[]`、`topologyComplete=true`、`status=healthy`。D518 未加入只允许作为迁移中的显式 degraded 状态,不能隐藏为 fallback;只有显式 `requireAllInstancesHealthy=true` 的服务才允许把缺失 standby/worker 节点提升为整体不健康。
|
- 拓扑健康:`expectedNodeIds` 负责展示计划内节点;当前 Code Queue 目标拓扑为 D601 原生 k3s 单节点多服务,`presentNodeIds` 应包含 `D601`、`missingNodeIds=[]`、`topologyComplete=true`、`status=healthy`。不能把未完成原生 k3s 接入或仍依赖 Docker 化 k3s 的节点列为 expected node;只有显式 `requireAllInstancesHealthy=true` 的服务才允许把缺失 standby/worker 节点提升为整体不健康。
|
||||||
- 前端:`用户服务 / k3s Control` React 页面必须只通过 `/api/microservices/k3sctl-adapter/proxy/api/control-plane` 通信,展示控制面状态、manifest、D601/D518 实例、active instance、Kubernetes API service proxy/no-fallback 路径和显式原始 JSON 按钮;页面不得直接访问 provider-gateway、D601/D518 业务容器端口、NodePort 或 raw k3s/kubectl API。
|
- 前端:`用户服务 / k3s Control` React 页面必须只通过 `/api/microservices/k3sctl-adapter/proxy/api/control-plane` 通信,展示控制面状态、manifest、D601 scheduler/read/write 实例、active instance、Kubernetes API service proxy/no-fallback 路径和显式原始 JSON 按钮;页面不得直接访问 provider-gateway、D601/D518 业务容器端口、NodePort 或 raw k3s/kubectl API。
|
||||||
|
|
||||||
### Code Queue k3s-Managed
|
### Code Queue k3s-Managed
|
||||||
|
|
||||||
@@ -148,7 +148,7 @@ Baidu Netdisk 在 UniDesk 语境中按纯后端服务管理:不得暴露百度
|
|||||||
- Orchestrator:`deployment.mode=k3sctl-managed`,`deployment.adapterServiceId=k3sctl-adapter`,`deployment.k3sServiceId=code-queue`,`backend.proxyMode=k3sctl-adapter-http`,`backend.nodeBaseUrl=k3s://code-queue`;backend-core 对 Code Queue 的正式链路只能是 `frontend -> backend-core -> k3sctl-adapter -> Kubernetes API service proxy -> Kubernetes Service ...:4222`。对外登记的 `code-queue` ID 保持稳定,但内部必须拆成 `code-queue-read`、`code-queue-write` 和 `code-queue-scheduler` 三个 Kubernetes Service,并由 backend-core 按 method/path 分流。
|
- Orchestrator:`deployment.mode=k3sctl-managed`,`deployment.adapterServiceId=k3sctl-adapter`,`deployment.k3sServiceId=code-queue`,`backend.proxyMode=k3sctl-adapter-http`,`backend.nodeBaseUrl=k3s://code-queue`;backend-core 对 Code Queue 的正式链路只能是 `frontend -> backend-core -> k3sctl-adapter -> Kubernetes API service proxy -> Kubernetes Service ...:4222`。对外登记的 `code-queue` ID 保持稳定,但内部必须拆成 `code-queue-read`、`code-queue-write` 和 `code-queue-scheduler` 三个 Kubernetes Service,并由 backend-core 按 method/path 分流。
|
||||||
- Direct path ban:`code-queue` 不得再登记 `http://code-queue:4222`、`http://host.docker.internal:4222`、NodePort 或 provider-gateway `microservice.http` 作为业务代理目标;frontend 也不得使用旧 `/api/code-queue-direct` 兼容别名作为 Code Queue 页面数据源。provider-gateway 只允许用于维护 D601/D518、部署 adapter、部署 k3s/k8s 节点或诊断节点本机容器。
|
- Direct path ban:`code-queue` 不得再登记 `http://code-queue:4222`、`http://host.docker.internal:4222`、NodePort 或 provider-gateway `microservice.http` 作为业务代理目标;frontend 也不得使用旧 `/api/code-queue-direct` 兼容别名作为 Code Queue 页面数据源。provider-gateway 只允许用于维护 D601/D518、部署 adapter、部署 k3s/k8s 节点或诊断节点本机容器。
|
||||||
- 服务拆分语义:`code-queue-read` 只承载 GET/HEAD 查询、overview、任务详情、Trace/output/transcript、统计和只读健康,可多副本滚动更新;它必须设置 `CODE_QUEUE_SERVICE_ROLE=read` 与 `CODE_QUEUE_SCHEDULER_ENABLED=false`,且不得接受入队、queue 变更、已读、重试、移动、追加 prompt 或打断这类 mutation。`code-queue-write` 承载入队、queue 创建/合并/更新、已读、手动重试、移动等命令写入,初期保持单副本和 `CODE_QUEUE_SERVICE_ROLE=write`,只把命令和任务状态写入 PostgreSQL,不启动 agent 子进程。`code-queue-scheduler` 是唯一拥有 scheduler 和 active run 的执行服务,设置 `CODE_QUEUE_SERVICE_ROLE=scheduler` 与 `CODE_QUEUE_SCHEDULER_ENABLED=true`,负责从 PostgreSQL 热任务集轮询新写入任务、推进队列、启动 Codex/OpenCode、处理 running task 的 steer/interrupt、发送终态通知和暴露执行端 `/health`。普通 Service 负载均衡不得把 mutation 打到 read,也不得把 running task 控制打到 write。
|
- 服务拆分语义:`code-queue-read` 只承载 GET/HEAD 查询、overview、任务详情、Trace/output/transcript、统计和只读健康,可多副本滚动更新;它必须设置 `CODE_QUEUE_SERVICE_ROLE=read` 与 `CODE_QUEUE_SCHEDULER_ENABLED=false`,且不得接受入队、queue 变更、已读、重试、移动、追加 prompt 或打断这类 mutation。`code-queue-write` 承载入队、queue 创建/合并/更新、已读、手动重试、移动等命令写入,初期保持单副本和 `CODE_QUEUE_SERVICE_ROLE=write`,只把命令和任务状态写入 PostgreSQL,不启动 agent 子进程。`code-queue-scheduler` 是唯一拥有 scheduler 和 active run 的执行服务,设置 `CODE_QUEUE_SERVICE_ROLE=scheduler` 与 `CODE_QUEUE_SCHEDULER_ENABLED=true`,负责从 PostgreSQL 热任务集轮询新写入任务、推进队列、启动 Codex/OpenCode、处理 running task 的 steer/interrupt、发送终态通知和暴露执行端 `/health`。普通 Service 负载均衡不得把 mutation 打到 read,也不得把 running task 控制打到 write。
|
||||||
- 实例语义:D601 是默认 active/single-writer 执行节点,当前 `code-queue-scheduler` 仍以一个 scheduler Pod 承载长生命周期 Codex/OpenCode 子进程;D518 是 standby/read-only 节点,必须设置 `CODE_QUEUE_SERVICE_ROLE=read`、`CODE_QUEUE_SCHEDULER_ENABLED=false` 和 `CODE_QUEUE_STARTUP_OA_BACKFILL_ENABLED=false`,避免两个实例同时消费同一 PostgreSQL 队列或重复回放 OA 统计。D601 scheduler 也默认关闭 `CODE_QUEUE_STARTUP_OA_BACKFILL_ENABLED`;历史 OA Trace/STEP 回填必须通过显式 `/api/oa/backfill` 运维动作触发,不能在每次 Pod 重启时自动批量发布旧事件。
|
- 实例语义:D601 是当前唯一 active/single-writer 执行节点,`code-queue-read` 在 D601 内多副本承载只读流量,`code-queue-write` 承载写入命令,`code-queue-scheduler` 以一个 scheduler Pod 承载长生命周期 Codex/OpenCode 子进程。D518 不属于当前 Code Queue k3s 拓扑;在没有原生 k3s-agent 与稳定 Kubernetes 网络前,不得把 D518 写回 `expectedNodeIds` 或恢复 `code-queue-d518` standby。D601 scheduler 默认关闭 `CODE_QUEUE_STARTUP_OA_BACKFILL_ENABLED`;历史 OA Trace/STEP 回填必须通过显式 `/api/oa/backfill` 运维动作触发,不能在每次 Pod 重启时自动批量发布旧事件。
|
||||||
- 滚动更新边界:read/write/scheduler 三服务拆分可以保证滚动更新期间 Code Queue 的只读 API 与大部分控制面入口可用,但当前 scheduler Pod 内仍直接承载正在运行的 agent 子进程,scheduler Pod 被替换时 active task 仍会进入 restart-recovery/retry 语义,不能宣称 running task 零中断。真正的长期目标是继续把调度器和执行器拆开:scheduler 只负责 claim task 并创建 Kubernetes Job/Pod 或独立 worker,runner 把输出、状态、attempt、事件和通知写回 PostgreSQL/OA Event Flow/归档;只有这样 controller/scheduler 滚动更新才不会影响正在执行的任务。
|
- 滚动更新边界:read/write/scheduler 三服务拆分可以保证滚动更新期间 Code Queue 的只读 API 与大部分控制面入口可用,但当前 scheduler Pod 内仍直接承载正在运行的 agent 子进程,scheduler Pod 被替换时 active task 仍会进入 restart-recovery/retry 语义,不能宣称 running task 零中断。真正的长期目标是继续把调度器和执行器拆开:scheduler 只负责 claim task 并创建 Kubernetes Job/Pod 或独立 worker,runner 把输出、状态、attempt、事件和通知写回 PostgreSQL/OA Event Flow/归档;只有这样 controller/scheduler 滚动更新才不会影响正在执行的任务。
|
||||||
- 部署引用:Code Queue 镜像仍复用 `src/components/microservices/code-queue/Dockerfile`,Kubernetes 运行清单为 `src/components/microservices/k3sctl-adapter/k3s/code-queue.k8s.yaml`,`config.json` 对外记录 k3s manifest `src/components/microservices/k3sctl-adapter/k3s/code-queue.k3s.json`;主 server 根目录 `docker-compose.yml` 不包含 `code-queue` service,旧 D601 direct Compose 文件只作为迁移/本地诊断参考,不是正式运行入口。
|
- 部署引用:Code Queue 镜像仍复用 `src/components/microservices/code-queue/Dockerfile`,Kubernetes 运行清单为 `src/components/microservices/k3sctl-adapter/k3s/code-queue.k8s.yaml`,`config.json` 对外记录 k3s manifest `src/components/microservices/k3sctl-adapter/k3s/code-queue.k3s.json`;主 server 根目录 `docker-compose.yml` 不包含 `code-queue` service,旧 D601 direct Compose 文件只作为迁移/本地诊断参考,不是正式运行入口。
|
||||||
- 主服务依赖映射:Code Queue 仍以主 PostgreSQL 为权威数据库,但 D601 k3s Pod 不能依赖公网直连 `74.48.78.17:15432/4255`。Pod 内 `DATABASE_URL` 和 `OA_EVENT_FLOW_BASE_URL` 必须指向集群内 `d601-tcp-egress-gateway` Service,再由该 gateway 通过 D601 provider-gateway egress proxy 的 HTTP CONNECT 转发到主 PostgreSQL 和 OA Event Flow;新增 TCP 依赖时扩展 `TCP_EGRESS_ROUTES`,不得在业务容器里新增一次性公网直连或 ad hoc 隧道。D601 active 实例的 `CODE_QUEUE_NOTIFY_CLAUDEQQ_BASE_URL` 必须使用集群内 ClaudeQQ Service `http://claudeqq.unidesk.svc.cluster.local:3290`,并把 `claudeqq`/`claudeqq.unidesk.svc.cluster.local` 加入 `NO_PROXY`,避免任务完成通知被默认出网代理错误转发。旧 `http://host.docker.internal:3290` 只允许作为迁移期诊断,不得作为 Code Queue k3s Pod 的正式通知路径。这些端口映射只服务受控节点运行时,必须用防火墙或等价策略限制来源,不得成为浏览器或任意公网客户端入口。
|
- 主服务依赖映射:Code Queue 仍以主 PostgreSQL 为权威数据库,但 D601 k3s Pod 不能依赖公网直连 `74.48.78.17:15432/4255`。Pod 内 `DATABASE_URL` 和 `OA_EVENT_FLOW_BASE_URL` 必须指向集群内 `d601-tcp-egress-gateway` Service,再由该 gateway 通过 D601 provider-gateway egress proxy 的 HTTP CONNECT 转发到主 PostgreSQL 和 OA Event Flow;新增 TCP 依赖时扩展 `TCP_EGRESS_ROUTES`,不得在业务容器里新增一次性公网直连或 ad hoc 隧道。D601 active 实例的 `CODE_QUEUE_NOTIFY_CLAUDEQQ_BASE_URL` 必须使用集群内 ClaudeQQ Service `http://claudeqq.unidesk.svc.cluster.local:3290`,并把 `claudeqq`/`claudeqq.unidesk.svc.cluster.local` 加入 `NO_PROXY`,避免任务完成通知被默认出网代理错误转发。旧 `http://host.docker.internal:3290` 只允许作为迁移期诊断,不得作为 Code Queue k3s Pod 的正式通知路径。这些端口映射只服务受控节点运行时,必须用防火墙或等价策略限制来源,不得成为浏览器或任意公网客户端入口。
|
||||||
@@ -301,7 +301,7 @@ ClaudeQQ 的业务源码和持久化数据仍在 D601,但正式运行由 k3s
|
|||||||
- `bun scripts/cli.ts microservice health claudeqq`、`bun scripts/cli.ts microservice proxy claudeqq /api/napcat/login`、`bun scripts/cli.ts microservice proxy claudeqq /api/events/recent` 和 `bun scripts/cli.ts microservice proxy claudeqq /api/events/subscriptions`:验证 ClaudeQQ 后端、NapCat 容器登录、事件订阅和私有代理链路;消息推送使用 `POST /api/push/text`,不得开放 D601 `3290/3000/3001/6099` 公网端口。
|
- `bun scripts/cli.ts microservice health claudeqq`、`bun scripts/cli.ts microservice proxy claudeqq /api/napcat/login`、`bun scripts/cli.ts microservice proxy claudeqq /api/events/recent` 和 `bun scripts/cli.ts microservice proxy claudeqq /api/events/subscriptions`:验证 ClaudeQQ 后端、NapCat 容器登录、事件订阅和私有代理链路;消息推送使用 `POST /api/push/text`,不得开放 D601 `3290/3000/3001/6099` 公网端口。
|
||||||
- `bun scripts/cli.ts microservice health todo-note` 与 `bun scripts/cli.ts microservice proxy todo-note /api/instances`:验证主 server Todo Note 后端、PostgreSQL 存储和本机 provider-gateway 私有代理链路。
|
- `bun scripts/cli.ts microservice health todo-note` 与 `bun scripts/cli.ts microservice proxy todo-note /api/instances`:验证主 server Todo Note 后端、PostgreSQL 存储和本机 provider-gateway 私有代理链路。
|
||||||
- `bun scripts/cli.ts microservice health oa-event-flow`、`bun scripts/cli.ts microservice proxy oa-event-flow /api/diagnostics --raw` 与 `bun scripts/cli.ts microservice proxy oa-event-flow '/api/events?tags=service:code-queue&limit=20' --raw`:验证统一 OA 事件流、事件表、tag 查询和统计中心。
|
- `bun scripts/cli.ts microservice health oa-event-flow`、`bun scripts/cli.ts microservice proxy oa-event-flow /api/diagnostics --raw` 与 `bun scripts/cli.ts microservice proxy oa-event-flow '/api/events?tags=service:code-queue&limit=20' --raw`:验证统一 OA 事件流、事件表、tag 查询和统计中心。
|
||||||
- `bun scripts/cli.ts microservice health k3sctl-adapter` 与 `bun scripts/cli.ts microservice proxy k3sctl-adapter /api/control-plane --raw`:验证 D601 `unidesk-k3s` 控制面 adapter、manifest、D601 scheduler、D601 read/write、D518 standby 实例状态、`presentNodeIds=[D601,D518]`、`missingNodeIds=[]` 和 no-fallback 运行路径。
|
- `bun scripts/cli.ts microservice health k3sctl-adapter` 与 `bun scripts/cli.ts microservice proxy k3sctl-adapter /api/control-plane --raw`:验证 D601 `unidesk-k3s` 控制面 adapter、manifest、D601 scheduler/read/write 实例状态、`presentNodeIds` 包含 `D601`、`missingNodeIds=[]` 和 no-fallback 运行路径。
|
||||||
- `bun scripts/cli.ts microservice health code-queue` 与 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`:验证 Code Queue 经过 backend-core -> k3sctl-adapter -> k3s Service proxy 的单一路径,其中 `/health` 指向 `code-queue-scheduler`,overview/详情只读请求指向 `code-queue-read`,写入类请求指向 `code-queue-write`;输出不得出现 `serviceId=code-queue` 的 provider-gateway `microservice.http` 业务代理任务,写入、追加 prompt、打断和 readAt/未读状态都必须由 backend 写入 PostgreSQL,frontend 不得用本地存储伪造成功状态。
|
- `bun scripts/cli.ts microservice health code-queue` 与 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`:验证 Code Queue 经过 backend-core -> k3sctl-adapter -> k3s Service proxy 的单一路径,其中 `/health` 指向 `code-queue-scheduler`,overview/详情只读请求指向 `code-queue-read`,写入类请求指向 `code-queue-write`;输出不得出现 `serviceId=code-queue` 的 provider-gateway `microservice.http` 业务代理任务,写入、追加 prompt、打断和 readAt/未读状态都必须由 backend 写入 PostgreSQL,frontend 不得用本地存储伪造成功状态。
|
||||||
- `bun scripts/cli.ts microservice health filebrowser`、`bun scripts/cli.ts microservice health filebrowser-d601` 与 `bun scripts/cli.ts microservice proxy filebrowser / --max-body-bytes 2000`:验证 D518 主 File Browser 和 D601 备用 File Browser 私有代理链路;浏览器 WebUI 必须通过 `/api/microservices/filebrowser/proxy/` 或 `/api/microservices/filebrowser-d601/proxy/` 访问,不得直接开放 `4251` 公网端口。
|
- `bun scripts/cli.ts microservice health filebrowser`、`bun scripts/cli.ts microservice health filebrowser-d601` 与 `bun scripts/cli.ts microservice proxy filebrowser / --max-body-bytes 2000`:验证 D518 主 File Browser 和 D601 备用 File Browser 私有代理链路;浏览器 WebUI 必须通过 `/api/microservices/filebrowser/proxy/` 或 `/api/microservices/filebrowser-d601/proxy/` 访问,不得直接开放 `4251` 公网端口。
|
||||||
- `bun scripts/cli.ts --main-server-ip 74.48.78.17 microservice health findjob`:在计算节点或其他非主 server 主机上通过公网 frontend remote CLI 进行同一验证,不需要主 server SSH key。
|
- `bun scripts/cli.ts --main-server-ip 74.48.78.17 microservice health findjob`:在计算节点或其他非主 server 主机上通过公网 frontend remote CLI 进行同一验证,不需要主 server SSH key。
|
||||||
@@ -329,7 +329,7 @@ ClaudeQQ 的业务源码和持久化数据仍在 D601,但正式运行由 k3s
|
|||||||
- 运行 `bun scripts/cli.ts microservice health met-nonlinear`、`bun scripts/cli.ts microservice proxy met-nonlinear /api/queue`、`bun scripts/cli.ts microservice proxy met-nonlinear '/api/projects?root=projects&limit=20'` 和 `bun scripts/cli.ts microservice proxy met-nonlinear /api/images`,确认真实链路经过 backend-core、WebSocket、D601 provider-gateway 和 D601 本机 MET Nonlinear TS 后端。
|
- 运行 `bun scripts/cli.ts microservice health met-nonlinear`、`bun scripts/cli.ts microservice proxy met-nonlinear /api/queue`、`bun scripts/cli.ts microservice proxy met-nonlinear '/api/projects?root=projects&limit=20'` 和 `bun scripts/cli.ts microservice proxy met-nonlinear /api/images`,确认真实链路经过 backend-core、WebSocket、D601 provider-gateway 和 D601 本机 MET Nonlinear TS 后端。
|
||||||
- 运行 `bun scripts/cli.ts microservice health claudeqq`、`bun scripts/cli.ts microservice proxy claudeqq /api/napcat/login`、`bun scripts/cli.ts microservice proxy claudeqq /api/events/recent` 和 `bun scripts/cli.ts microservice proxy claudeqq /api/events/subscriptions`,确认真实链路经过 backend-core、k3sctl-adapter、Kubernetes API service proxy 和 D601 Kubernetes Service `claudeqq:3290`;health 应显示 `service=claudeqq`、`pureBackend=true`、`napcat.containerized=true`、NapCat HTTP/WS 状态、二维码状态和订阅计数。
|
- 运行 `bun scripts/cli.ts microservice health claudeqq`、`bun scripts/cli.ts microservice proxy claudeqq /api/napcat/login`、`bun scripts/cli.ts microservice proxy claudeqq /api/events/recent` 和 `bun scripts/cli.ts microservice proxy claudeqq /api/events/subscriptions`,确认真实链路经过 backend-core、k3sctl-adapter、Kubernetes API service proxy 和 D601 Kubernetes Service `claudeqq:3290`;health 应显示 `service=claudeqq`、`pureBackend=true`、`napcat.containerized=true`、NapCat HTTP/WS 状态、二维码状态和订阅计数。
|
||||||
- 运行 `bun scripts/cli.ts microservice health todo-note` 与 `bun scripts/cli.ts microservice proxy todo-note /api/instances`,确认真实链路经过 backend-core、WebSocket、main-server provider-gateway 和主 server `todo-note-backend` 后端;输出中必须包含五个迁移清单和 PostgreSQL 存储健康状态。
|
- 运行 `bun scripts/cli.ts microservice health todo-note` 与 `bun scripts/cli.ts microservice proxy todo-note /api/instances`,确认真实链路经过 backend-core、WebSocket、main-server provider-gateway 和主 server `todo-note-backend` 后端;输出中必须包含五个迁移清单和 PostgreSQL 存储健康状态。
|
||||||
- 运行 `bun scripts/cli.ts microservice health k3sctl-adapter`、`bun scripts/cli.ts microservice proxy k3sctl-adapter /api/control-plane --raw`、`bun scripts/cli.ts microservice health code-queue` 与 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`,确认真实链路经过 backend-core -> k3sctl-adapter -> k3s Service proxy,且 read/write/scheduler 三个内部 Service 都有 ready endpoint;adapter 验收还必须证明其作为 UniDesk 直管服务运行在 k3s 外部,Docker 形态下挂载宿主 `/etc/rancher/k3s/k3s.yaml` 与 `/run/host-ssh/id_ed25519`,通过容器内 SSH local tunnel 连接 WSL 原生 k3s API,且没有 active `rancher/k3s` 控制面容器。Code Queue `/health` 必须仍返回业务后端自己的 `role=scheduler`、`queue.storage.primary=postgres`、`queue.storage.postgresReady=true`、`queue.notifications.claudeqq.outbox.storage=postgres` 和 `egressProxy.connected=true`,不得被 adapter 聚合健康 JSON 替代。还必须在 active Code Queue Pod 内验证主 PostgreSQL 端口映射、主 OA Event Flow 端口映射、集群内 ClaudeQQ `http://claudeqq.unidesk.svc.cluster.local:3290/health` 和 `d601-provider-egress-proxy` 均可访问,并确认 `/workspace` 与 `/home/ubuntu` 指向同一 WSL home hostPath,`/workspace/cq-deploy` 这类绝对 symlink 可以进入真实目录。再在 adapter 控制页确认 D601 scheduler serving healthy、D601 read/write Service healthy、D518 standby pod ready、`missingNodeIds=[]` 且整体不退化为 hidden fallback。再通过公网 frontend 提交一个 `gpt-5.5` 小任务,确认任务由 write 入库、scheduler 轮询执行、read 返回输出实时更新、结束后有 judge 判定,且运行中可追加 prompt 或打断。Code Queue 的重启恢复必须作为验收项:运行中任务存在时重启或重建 scheduler 实例后,任务必须从 PostgreSQL 恢复到可继续执行状态,不能丢失 active task、`promptHistory`、后续 queued 任务、readAt/未读状态或已入 outbox 的 ClaudeQQ 通知。Code Queue 服务名、表名前缀或持久化目录发生迁移后,还必须运行 `bun scripts/cli.ts e2e run --only microservice:catalog-code-queue,microservice:code-queue-status,microservice:code-queue-health,microservice:code-queue-tasks`,证明 backend-core catalog、k3s adapter 私有代理、PostgreSQL 队列和任务列表都指向 `code-queue`。批量验收必须通过公网 frontend 设置 `入队份数=5` 或使用多段 prompt 分隔,一次性入队 5 条任务,并确认 5 条任务按顺序进入 running/judging/succeeded,而不是只运行第一条。
|
- 运行 `bun scripts/cli.ts microservice health k3sctl-adapter`、`bun scripts/cli.ts microservice proxy k3sctl-adapter /api/control-plane --raw`、`bun scripts/cli.ts microservice health code-queue` 与 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`,确认真实链路经过 backend-core -> k3sctl-adapter -> k3s Service proxy,且 read/write/scheduler 三个内部 Service 都有 ready endpoint;adapter 验收还必须证明其作为 UniDesk 直管服务运行在 k3s 外部,Docker 形态下挂载宿主 `/etc/rancher/k3s/k3s.yaml` 与 `/run/host-ssh/id_ed25519`,通过容器内 SSH local tunnel 连接 WSL 原生 k3s API,且 D601/D518 上都没有 active `rancher/k3s` 容器。Code Queue `/health` 必须仍返回业务后端自己的 `role=scheduler`、`queue.storage.primary=postgres`、`queue.storage.postgresReady=true`、`queue.notifications.claudeqq.outbox.storage=postgres` 和 `egressProxy.connected=true`,不得被 adapter 聚合健康 JSON 替代。还必须在 active Code Queue Pod 内验证主 PostgreSQL 端口映射、主 OA Event Flow 端口映射、集群内 ClaudeQQ `http://claudeqq.unidesk.svc.cluster.local:3290/health` 和 `d601-provider-egress-proxy` 均可访问,并确认 `/workspace` 与 `/home/ubuntu` 指向同一 WSL home hostPath,`/workspace/cq-deploy` 这类绝对 symlink 可以进入真实目录。再在 adapter 控制页确认 D601 scheduler serving healthy、D601 read/write Service healthy、`presentNodeIds` 包含 `D601`、`missingNodeIds=[]` 且整体不退化为 hidden fallback。再通过公网 frontend 提交一个 `gpt-5.5` 小任务,确认任务由 write 入库、scheduler 轮询执行、read 返回输出实时更新、结束后有 judge 判定,且运行中可追加 prompt 或打断。Code Queue 的重启恢复必须作为验收项:运行中任务存在时重启或重建 scheduler 实例后,任务必须从 PostgreSQL 恢复到可继续执行状态,不能丢失 active task、`promptHistory`、后续 queued 任务、readAt/未读状态或已入 outbox 的 ClaudeQQ 通知。Code Queue 服务名、表名前缀或持久化目录发生迁移后,还必须运行 `bun scripts/cli.ts e2e run --only microservice:catalog-code-queue,microservice:code-queue-status,microservice:code-queue-health,microservice:code-queue-tasks`,证明 backend-core catalog、k3s adapter 私有代理、PostgreSQL 队列和任务列表都指向 `code-queue`。批量验收必须通过公网 frontend 设置 `入队份数=5` 或使用多段 prompt 分隔,一次性入队 5 条任务,并确认 5 条任务按顺序进入 running/judging/succeeded,而不是只运行第一条。
|
||||||
- Code Queue 内存防回归验收:凡是改动 Code Queue 的持久化、scheduler、输出/Trace、health、列表/详情查询、日志导出或容器运行参数,交付前必须在 D601 用 `kubectl -n unidesk get deploy,pod,svc,endpoints -o wide`、`kubectl -n unidesk describe deploy/code-queue` 或等价 Docker inspect 确认 memory/swap 硬上限符合预算,运行 `kubectl -n unidesk top pod` 或 Docker stats 确认常驻内存、`OOMKilled=false` 和 `RestartCount` 未异常增长,再运行 `bun scripts/cli.ts microservice health code-queue` 确认 `/health` 是轻量 readiness 且暴露 PostgreSQL/notification/outbox 状态。验收还必须覆盖有历史任务存在时的 `/api/tasks/overview`、单任务详情和 output/transcript 查询,证明热状态裁剪不会丢历史输出、也不会重新把全部历史 `task_json` 缓存在进程内;涉及 TypeScript/frontend 验证的任务应能在 D601 Code Queue memory/swap 预算中完成 `bun run --cwd src/components/frontend check` 这类短时高内存命令,而不是被 memory watchdog 反复 SIGTERM。
|
- Code Queue 内存防回归验收:凡是改动 Code Queue 的持久化、scheduler、输出/Trace、health、列表/详情查询、日志导出或容器运行参数,交付前必须在 D601 用 `kubectl -n unidesk get deploy,pod,svc,endpoints -o wide`、`kubectl -n unidesk describe deploy/code-queue` 或等价 Docker inspect 确认 memory/swap 硬上限符合预算,运行 `kubectl -n unidesk top pod` 或 Docker stats 确认常驻内存、`OOMKilled=false` 和 `RestartCount` 未异常增长,再运行 `bun scripts/cli.ts microservice health code-queue` 确认 `/health` 是轻量 readiness 且暴露 PostgreSQL/notification/outbox 状态。验收还必须覆盖有历史任务存在时的 `/api/tasks/overview`、单任务详情和 output/transcript 查询,证明热状态裁剪不会丢历史输出、也不会重新把全部历史 `task_json` 缓存在进程内;涉及 TypeScript/frontend 验证的任务应能在 D601 Code Queue memory/swap 预算中完成 `bun run --cwd src/components/frontend check` 这类短时高内存命令,而不是被 memory watchdog 反复 SIGTERM。
|
||||||
- Code Queue 延迟防回归验收:凡是改动 Code Queue 列表、overview、readAt、Trace/summary 懒加载、实时 output/SSE 事件发布、frontend 请求策略、backend-core 用户服务代理或 frontend Code Queue 请求路径,交付前必须在有历史任务数据且有 active output 流动的 live 环境验证 `GET /api/tasks/overview`、`POST /api/tasks/<id>/read`、选定 task 的 `trace-step` 和前端 `/app/code-queue/` 首屏均低于 1s 目标;可运行 `bun scripts/src/code-queue-perf.ts --json --target-ms 1000` 采集公网 frontend 下的首屏耗时、最慢 API 和 DOM 完成指标,并用 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview --raw`、D601 Pod `/health` 与 `/api/tasks/overview` curl、性能面板 `/api/performance` 与 `/api/frontend-performance` 失败/慢操作记录、`kubectl -n unidesk top pod` 或 Docker stats 补充后端耗时、代理 502 和内存/CPU 证据。验收结论必须同时说明是否使用了短 TTL cache、cache 如何被 mutation 或 archive append 失效、数据库索引/聚合是否命中、输出热路径是否只读增量指标,以及分页加载是否跳过 selected/active/stats;不能只展示 cache 命中后的单次快照。
|
- Code Queue 延迟防回归验收:凡是改动 Code Queue 列表、overview、readAt、Trace/summary 懒加载、实时 output/SSE 事件发布、frontend 请求策略、backend-core 用户服务代理或 frontend Code Queue 请求路径,交付前必须在有历史任务数据且有 active output 流动的 live 环境验证 `GET /api/tasks/overview`、`POST /api/tasks/<id>/read`、选定 task 的 `trace-step` 和前端 `/app/code-queue/` 首屏均低于 1s 目标;可运行 `bun scripts/src/code-queue-perf.ts --json --target-ms 1000` 采集公网 frontend 下的首屏耗时、最慢 API 和 DOM 完成指标,并用 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview --raw`、D601 Pod `/health` 与 `/api/tasks/overview` curl、性能面板 `/api/performance` 与 `/api/frontend-performance` 失败/慢操作记录、`kubectl -n unidesk top pod` 或 Docker stats 补充后端耗时、代理 502 和内存/CPU 证据。验收结论必须同时说明是否使用了短 TTL cache、cache 如何被 mutation 或 archive append 失效、数据库索引/聚合是否命中、输出热路径是否只读增量指标,以及分页加载是否跳过 selected/active/stats;不能只展示 cache 命中后的单次快照。
|
||||||
- 运行 `bun scripts/cli.ts microservice health filebrowser`、`bun scripts/cli.ts microservice health filebrowser-d601` 和 `bun scripts/cli.ts microservice proxy filebrowser / --max-body-bytes 2000`,确认 File Browser health 返回 `status=OK`,WebUI HTML 包含 `File Browser`,D518/D601 通过 provider-gateway 访问节点本机 `4251`;随后在公网 frontend 的 `用户服务 / File Browser` 中确认 D518 为默认目标、可导出截图、iframe 紧凑布局不再有巨大 `folder` 标记遮挡文件名,并可浏览 `/mnt/c`。
|
- 运行 `bun scripts/cli.ts microservice health filebrowser`、`bun scripts/cli.ts microservice health filebrowser-d601` 和 `bun scripts/cli.ts microservice proxy filebrowser / --max-body-bytes 2000`,确认 File Browser health 返回 `status=OK`,WebUI HTML 包含 `File Browser`,D518/D601 通过 provider-gateway 访问节点本机 `4251`;随后在公网 frontend 的 `用户服务 / File Browser` 中确认 D518 为默认目标、可导出截图、iframe 紧凑布局不再有巨大 `folder` 标记遮挡文件名,并可浏览 `/mnt/c`。
|
||||||
|
|||||||
@@ -1000,7 +1000,11 @@ function k8sDeploymentsForService(service: UniDeskMicroserviceConfig): string[]
|
|||||||
function applyK8sScript(service: UniDeskMicroserviceConfig): string {
|
function applyK8sScript(service: UniDeskMicroserviceConfig): string {
|
||||||
const manifest = `${targetWorkDir(service)}/${k8sManifestPath(service)}`;
|
const manifest = `${targetWorkDir(service)}/${k8sManifestPath(service)}`;
|
||||||
const cleanup = service.id === "code-queue"
|
const cleanup = service.id === "code-queue"
|
||||||
? `KUBECONFIG=${shellQuote(k8sKubeconfig)} kubectl -n ${shellQuote(k8sNamespace)} delete endpointslice d601-provider-egress-proxy --ignore-not-found`
|
? [
|
||||||
|
`KUBECONFIG=${shellQuote(k8sKubeconfig)} kubectl -n ${shellQuote(k8sNamespace)} delete endpointslice d601-provider-egress-proxy --ignore-not-found`,
|
||||||
|
`KUBECONFIG=${shellQuote(k8sKubeconfig)} kubectl -n ${shellQuote(k8sNamespace)} delete deployment code-queue-d518 --ignore-not-found`,
|
||||||
|
`KUBECONFIG=${shellQuote(k8sKubeconfig)} kubectl -n ${shellQuote(k8sNamespace)} delete service code-queue-d518 --ignore-not-found`,
|
||||||
|
].join("\n")
|
||||||
: "";
|
: "";
|
||||||
return [
|
return [
|
||||||
cleanup,
|
cleanup,
|
||||||
|
|||||||
+1
-4
@@ -1125,7 +1125,6 @@ async function serviceChecks(config: UniDeskConfig, urls: PublicUrls, checks: E2
|
|||||||
} }).body;
|
} }).body;
|
||||||
const k3sctlCodeQueueService = k3sctlControlPlaneBody?.services?.find((service) => service.id === "code-queue");
|
const k3sctlCodeQueueService = k3sctlControlPlaneBody?.services?.find((service) => service.id === "code-queue");
|
||||||
const k3sctlClaudeqqService = k3sctlControlPlaneBody?.services?.find((service) => service.id === "claudeqq");
|
const k3sctlClaudeqqService = k3sctlControlPlaneBody?.services?.find((service) => service.id === "claudeqq");
|
||||||
const k3sctlD518Instance = k3sctlCodeQueueService?.instances?.find((instance) => instance.id === "D518");
|
|
||||||
const filebrowserHealthBody = (filebrowserHealth as { body?: { status?: string } }).body;
|
const filebrowserHealthBody = (filebrowserHealth as { body?: { status?: string } }).body;
|
||||||
const filebrowserD601HealthBody = (filebrowserD601Health as { body?: { status?: string } }).body;
|
const filebrowserD601HealthBody = (filebrowserD601Health as { body?: { status?: string } }).body;
|
||||||
const filebrowserWebuiText = String((filebrowserWebui as { body?: { text?: string } }).body?.text || "");
|
const filebrowserWebuiText = String((filebrowserWebui as { body?: { text?: string } }).body?.text || "");
|
||||||
@@ -1196,10 +1195,8 @@ async function serviceChecks(config: UniDeskConfig, urls: PublicUrls, checks: E2
|
|||||||
&& k3sctlClaudeqqService?.active?.id === "D601"
|
&& k3sctlClaudeqqService?.active?.id === "D601"
|
||||||
&& k3sctlClaudeqqService?.active?.healthy === true
|
&& k3sctlClaudeqqService?.active?.healthy === true
|
||||||
&& (k3sctlCodeQueueService?.presentNodeIds ?? []).includes("D601")
|
&& (k3sctlCodeQueueService?.presentNodeIds ?? []).includes("D601")
|
||||||
&& (k3sctlCodeQueueService?.presentNodeIds ?? []).includes("D518")
|
|
||||||
&& (k3sctlCodeQueueService?.missingNodeIds ?? []).length === 0
|
&& (k3sctlCodeQueueService?.missingNodeIds ?? []).length === 0
|
||||||
&& k3sctlD518Instance?.healthy === true
|
&& (k3sctlCodeQueueService?.instances ?? []).some((instance) => instance.id === "D601" && instance.healthy === true),
|
||||||
&& k3sctlD518Instance?.proxyMode === "kubernetes-api-pod-readiness",
|
|
||||||
{
|
{
|
||||||
ok: (k3sctlControlPlane as { ok?: boolean }).ok,
|
ok: (k3sctlControlPlane as { ok?: boolean }).ok,
|
||||||
clusterId: k3sctlControlPlaneBody?.clusterId,
|
clusterId: k3sctlControlPlaneBody?.clusterId,
|
||||||
|
|||||||
@@ -174,7 +174,7 @@ export function K3sCtlPage({ microservices, onRaw, apiBaseUrl, onNavigate }: Any
|
|||||||
h("div", { className: "k3s-hero" },
|
h("div", { className: "k3s-hero" },
|
||||||
h("div", { className: "k3s-orb", "aria-hidden": "true" }, h("span", null, "k3s")),
|
h("div", { className: "k3s-orb", "aria-hidden": "true" }, h("span", null, "k3s")),
|
||||||
h("div", { className: "k3s-hero-copy" },
|
h("div", { className: "k3s-hero-copy" },
|
||||||
h("p", { className: "eyebrow" }, "D601 control plane / D518 managed node"),
|
h("p", { className: "eyebrow" }, "D601 native control plane"),
|
||||||
h("h2", null, "UniDesk 只管理 adapter;业务微服务交给 k3s 标准服务路由"),
|
h("h2", null, "UniDesk 只管理 adapter;业务微服务交给 k3s 标准服务路由"),
|
||||||
h("p", { className: "muted paragraph" }, "Code Queue 的前端/API 请求进入 k3sctl-adapter,再由 adapter 转发到 k3s active service。provider-gateway 只用于维护 adapter 和节点诊断,不再直接管理 Code Queue 容器。"),
|
h("p", { className: "muted paragraph" }, "Code Queue 的前端/API 请求进入 k3sctl-adapter,再由 adapter 转发到 k3s active service。provider-gateway 只用于维护 adapter 和节点诊断,不再直接管理 Code Queue 容器。"),
|
||||||
h("div", { className: "k3s-route-strip" },
|
h("div", { className: "k3s-route-strip" },
|
||||||
@@ -210,7 +210,7 @@ export function K3sCtlPage({ microservices, onRaw, apiBaseUrl, onNavigate }: Any
|
|||||||
state.refreshedAt ? h("p", { className: "muted paragraph" }, `最近刷新 ${fmtClock(state.refreshedAt)}`) : null,
|
state.refreshedAt ? h("p", { className: "muted paragraph" }, `最近刷新 ${fmtClock(state.refreshedAt)}`) : null,
|
||||||
),
|
),
|
||||||
services.length === 0
|
services.length === 0
|
||||||
? h(Panel, { title: "代管服务", eyebrow: "k3s services", loading: state.loading }, h(EmptyState, { title: "暂无 k3s 服务", text: "等待 k3sctl-adapter 返回 /api/services;Code Queue 切换后这里应显示 D601 和 D518 两个实例。" }))
|
? h(Panel, { title: "代管服务", eyebrow: "k3s services", loading: state.loading }, h(EmptyState, { title: "暂无 k3s 服务", text: "等待 k3sctl-adapter 返回 /api/services;Code Queue 应显示 D601 scheduler/read/write 服务实例。" }))
|
||||||
: services.map((service) => renderManagedService(service, onRaw)),
|
: services.map((service) => renderManagedService(service, onRaw)),
|
||||||
);
|
);
|
||||||
}
|
}
|
||||||
|
|||||||
@@ -21,8 +21,7 @@
|
|||||||
"activeInstanceId": "D601",
|
"activeInstanceId": "D601",
|
||||||
"singleWriter": true,
|
"singleWriter": true,
|
||||||
"expectedNodeIds": [
|
"expectedNodeIds": [
|
||||||
"D601",
|
"D601"
|
||||||
"D518"
|
|
||||||
],
|
],
|
||||||
"instances": [
|
"instances": [
|
||||||
{
|
{
|
||||||
@@ -32,14 +31,6 @@
|
|||||||
"baseUrl": "kubernetes://unidesk/services/code-queue-scheduler:4222",
|
"baseUrl": "kubernetes://unidesk/services/code-queue-scheduler:4222",
|
||||||
"healthPath": "/health",
|
"healthPath": "/health",
|
||||||
"healthMode": "service-proxy"
|
"healthMode": "service-proxy"
|
||||||
},
|
|
||||||
{
|
|
||||||
"id": "D518",
|
|
||||||
"nodeId": "D518",
|
|
||||||
"role": "standby",
|
|
||||||
"baseUrl": "kubernetes://unidesk/services/code-queue-d518:4222",
|
|
||||||
"healthPath": "/health",
|
|
||||||
"healthMode": "pod-ready"
|
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
"requireAllInstancesHealthy": false
|
"requireAllInstancesHealthy": false
|
||||||
|
|||||||
@@ -1128,253 +1128,3 @@ spec:
|
|||||||
- name: http
|
- name: http
|
||||||
port: 4222
|
port: 4222
|
||||||
targetPort: http
|
targetPort: http
|
||||||
|
|
||||||
---
|
|
||||||
apiVersion: apps/v1
|
|
||||||
kind: Deployment
|
|
||||||
metadata:
|
|
||||||
name: code-queue-d518
|
|
||||||
namespace: unidesk
|
|
||||||
labels:
|
|
||||||
app.kubernetes.io/name: code-queue
|
|
||||||
app.kubernetes.io/part-of: unidesk
|
|
||||||
unidesk.ai/deployment-mode: k3sctl-managed
|
|
||||||
unidesk.ai/instance-id: D518
|
|
||||||
spec:
|
|
||||||
replicas: 1
|
|
||||||
selector:
|
|
||||||
matchLabels:
|
|
||||||
app.kubernetes.io/name: code-queue
|
|
||||||
unidesk.ai/instance-id: D518
|
|
||||||
template:
|
|
||||||
metadata:
|
|
||||||
labels:
|
|
||||||
app.kubernetes.io/name: code-queue
|
|
||||||
app.kubernetes.io/part-of: unidesk
|
|
||||||
unidesk.ai/deployment-mode: k3sctl-managed
|
|
||||||
unidesk.ai/instance-id: D518
|
|
||||||
unidesk.ai/node-id: D518
|
|
||||||
spec:
|
|
||||||
nodeSelector:
|
|
||||||
unidesk.ai/node-id: D518
|
|
||||||
terminationGracePeriodSeconds: 30
|
|
||||||
containers:
|
|
||||||
- name: code-queue
|
|
||||||
image: unidesk-code-queue:d601
|
|
||||||
imagePullPolicy: IfNotPresent
|
|
||||||
ports:
|
|
||||||
- name: http
|
|
||||||
containerPort: 4222
|
|
||||||
envFrom:
|
|
||||||
- secretRef:
|
|
||||||
name: code-queue-env
|
|
||||||
optional: true
|
|
||||||
env:
|
|
||||||
- name: HOST
|
|
||||||
value: "0.0.0.0"
|
|
||||||
- name: PORT
|
|
||||||
value: "4222"
|
|
||||||
- name: DATABASE_URL
|
|
||||||
value: "postgres://unidesk:unidesk_dev_password@d601-tcp-egress-gateway.unidesk.svc.cluster.local:15432/unidesk"
|
|
||||||
- name: CODE_QUEUE_INSTANCE_ID
|
|
||||||
value: "D518"
|
|
||||||
- name: CODE_QUEUE_SERVICE_ROLE
|
|
||||||
value: "read"
|
|
||||||
- name: CODE_QUEUE_SCHEDULER_ENABLED
|
|
||||||
value: "false"
|
|
||||||
- name: CODE_QUEUE_STARTUP_OA_BACKFILL_ENABLED
|
|
||||||
value: "false"
|
|
||||||
- name: CODE_QUEUE_DATA_DIR
|
|
||||||
value: "/var/lib/unidesk/code-queue"
|
|
||||||
- name: CODE_QUEUE_WORKDIR
|
|
||||||
value: "/workspace"
|
|
||||||
- name: CODE_QUEUE_CODEX_HOME
|
|
||||||
value: "/var/lib/unidesk/code-queue/codex-home"
|
|
||||||
- name: CODE_QUEUE_OPENCODE_XDG_DIR
|
|
||||||
value: "/var/lib/unidesk/code-queue/opencode-xdg"
|
|
||||||
- name: CODE_QUEUE_SOURCE_CODEX_CONFIG
|
|
||||||
value: "/root/.codex/config.toml"
|
|
||||||
- name: CODE_QUEUE_DEFAULT_MODEL
|
|
||||||
value: "gpt-5.5"
|
|
||||||
- name: CODE_QUEUE_MODELS
|
|
||||||
value: "gpt-5.5,gpt-5.4-mini,gpt-5.4,minimax-m2.7"
|
|
||||||
- name: CODE_QUEUE_MODEL_REASONING_EFFORTS
|
|
||||||
value: "gpt-5.5=xhigh"
|
|
||||||
- name: CODE_QUEUE_SANDBOX
|
|
||||||
value: "danger-full-access"
|
|
||||||
- name: CODE_QUEUE_APPROVAL_POLICY
|
|
||||||
value: "never"
|
|
||||||
- name: CODE_QUEUE_MAX_ACTIVE_QUEUES
|
|
||||||
value: "0"
|
|
||||||
- name: CODE_QUEUE_DATABASE_POOL_MAX
|
|
||||||
value: "2"
|
|
||||||
- name: NODE_OPTIONS
|
|
||||||
value: "--max-old-space-size=1024"
|
|
||||||
- name: GIT_CONFIG_COUNT
|
|
||||||
value: "1"
|
|
||||||
- name: GIT_CONFIG_KEY_0
|
|
||||||
value: "safe.directory"
|
|
||||||
- name: GIT_CONFIG_VALUE_0
|
|
||||||
value: "*"
|
|
||||||
- name: CODE_QUEUE_IN_MEMORY_OUTPUT_RECORDS
|
|
||||||
value: "10"
|
|
||||||
- name: CODE_QUEUE_IN_MEMORY_EVENT_RECORDS
|
|
||||||
value: "10"
|
|
||||||
- name: CODE_QUEUE_MAIN_PROVIDER_ID
|
|
||||||
value: "D518"
|
|
||||||
- name: CODE_QUEUE_REMOTE_WORKDIR
|
|
||||||
value: "/home/ubuntu"
|
|
||||||
- name: CODE_QUEUE_EXECUTION_PROVIDER_IDS
|
|
||||||
value: "D518"
|
|
||||||
- name: CODE_QUEUE_DEV_CONTAINER_MASTER_HOST
|
|
||||||
value: "74.48.78.17"
|
|
||||||
- name: CODE_QUEUE_DEV_CONTAINER_DEFAULT_PROVIDER_ID
|
|
||||||
value: "D518"
|
|
||||||
- name: CODE_QUEUE_DEV_CONTAINER_WORKDIR
|
|
||||||
value: "/home/ubuntu"
|
|
||||||
- name: CODE_QUEUE_EGRESS_PROXY_ENABLED
|
|
||||||
value: "false"
|
|
||||||
- name: CODE_QUEUE_EGRESS_PROXY_URL
|
|
||||||
value: ""
|
|
||||||
- name: CODE_QUEUE_EGRESS_PROXY_NO_PROXY
|
|
||||||
value: "localhost,127.0.0.1,::1,host.docker.internal,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,backend-core,oa-event-flow,database"
|
|
||||||
- name: HTTP_PROXY
|
|
||||||
value: ""
|
|
||||||
- name: HTTPS_PROXY
|
|
||||||
value: ""
|
|
||||||
- name: ALL_PROXY
|
|
||||||
value: ""
|
|
||||||
- name: http_proxy
|
|
||||||
value: ""
|
|
||||||
- name: https_proxy
|
|
||||||
value: ""
|
|
||||||
- name: all_proxy
|
|
||||||
value: ""
|
|
||||||
- name: NO_PROXY
|
|
||||||
value: "localhost,127.0.0.1,::1,host.docker.internal,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,backend-core,oa-event-flow,database"
|
|
||||||
- name: no_proxy
|
|
||||||
value: "localhost,127.0.0.1,::1,host.docker.internal,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,backend-core,oa-event-flow,database"
|
|
||||||
- name: OA_EVENT_FLOW_BASE_URL
|
|
||||||
value: "http://d601-tcp-egress-gateway.unidesk.svc.cluster.local:4255"
|
|
||||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_ENABLED
|
|
||||||
value: "false"
|
|
||||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_BASE_URL
|
|
||||||
value: ""
|
|
||||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_TARGET_TYPE
|
|
||||||
value: "private"
|
|
||||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_USER_ID
|
|
||||||
value: "645275593"
|
|
||||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_MAX_RESPONSE_CHARS
|
|
||||||
value: "12000"
|
|
||||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_TIMEOUT_MS
|
|
||||||
value: "15000"
|
|
||||||
- name: CODE_QUEUE_NOTIFY_CLAUDEQQ_SEND_ATTEMPTS
|
|
||||||
value: "3"
|
|
||||||
- name: LOG_FILE
|
|
||||||
value: "/var/log/unidesk/code-queue-d518.jsonl"
|
|
||||||
- name: UNIDESK_LOG_RETENTION_BYTES
|
|
||||||
value: "1GiB"
|
|
||||||
volumeMounts:
|
|
||||||
- name: docker-sock
|
|
||||||
mountPath: /var/run/docker.sock
|
|
||||||
- name: workspace
|
|
||||||
mountPath: /workspace
|
|
||||||
- name: workspace
|
|
||||||
mountPath: /home/ubuntu
|
|
||||||
- name: repo
|
|
||||||
mountPath: /root/unidesk
|
|
||||||
- name: repo
|
|
||||||
mountPath: /app
|
|
||||||
- name: codex-config
|
|
||||||
mountPath: /root/.codex/config.toml
|
|
||||||
readOnly: true
|
|
||||||
- name: codex-auth
|
|
||||||
mountPath: /root/.codex/auth.json
|
|
||||||
readOnly: true
|
|
||||||
- name: ssh-dir
|
|
||||||
mountPath: /root/.ssh
|
|
||||||
readOnly: true
|
|
||||||
- name: logs
|
|
||||||
mountPath: /var/log/unidesk
|
|
||||||
- name: state
|
|
||||||
mountPath: /var/lib/unidesk/code-queue
|
|
||||||
readinessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /health
|
|
||||||
port: http
|
|
||||||
periodSeconds: 5
|
|
||||||
timeoutSeconds: 3
|
|
||||||
failureThreshold: 20
|
|
||||||
livenessProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /health
|
|
||||||
port: http
|
|
||||||
periodSeconds: 10
|
|
||||||
timeoutSeconds: 3
|
|
||||||
failureThreshold: 6
|
|
||||||
startupProbe:
|
|
||||||
httpGet:
|
|
||||||
path: /health
|
|
||||||
port: http
|
|
||||||
periodSeconds: 5
|
|
||||||
timeoutSeconds: 3
|
|
||||||
failureThreshold: 60
|
|
||||||
resources:
|
|
||||||
requests:
|
|
||||||
cpu: 250m
|
|
||||||
memory: 512Mi
|
|
||||||
limits:
|
|
||||||
memory: 4Gi
|
|
||||||
volumes:
|
|
||||||
- name: docker-sock
|
|
||||||
hostPath:
|
|
||||||
path: /var/run/docker.sock
|
|
||||||
type: Socket
|
|
||||||
- name: workspace
|
|
||||||
hostPath:
|
|
||||||
path: /home/ubuntu
|
|
||||||
type: Directory
|
|
||||||
- name: repo
|
|
||||||
hostPath:
|
|
||||||
path: /home/ubuntu/cq-deploy
|
|
||||||
type: Directory
|
|
||||||
- name: codex-config
|
|
||||||
hostPath:
|
|
||||||
path: /home/ubuntu/.codex/config.toml
|
|
||||||
type: File
|
|
||||||
- name: codex-auth
|
|
||||||
hostPath:
|
|
||||||
path: /home/ubuntu/.codex/auth.json
|
|
||||||
type: File
|
|
||||||
- name: ssh-dir
|
|
||||||
hostPath:
|
|
||||||
path: /home/ubuntu/.ssh
|
|
||||||
type: Directory
|
|
||||||
- name: logs
|
|
||||||
hostPath:
|
|
||||||
path: /home/ubuntu/cq-deploy/.state/code-queue/logs
|
|
||||||
type: DirectoryOrCreate
|
|
||||||
- name: state
|
|
||||||
hostPath:
|
|
||||||
path: /home/ubuntu/cq-deploy/.state/code-queue
|
|
||||||
type: DirectoryOrCreate
|
|
||||||
---
|
|
||||||
apiVersion: v1
|
|
||||||
kind: Service
|
|
||||||
metadata:
|
|
||||||
name: code-queue-d518
|
|
||||||
namespace: unidesk
|
|
||||||
labels:
|
|
||||||
app.kubernetes.io/name: code-queue
|
|
||||||
app.kubernetes.io/part-of: unidesk
|
|
||||||
unidesk.ai/deployment-mode: k3sctl-managed
|
|
||||||
unidesk.ai/instance-id: D518
|
|
||||||
spec:
|
|
||||||
type: ClusterIP
|
|
||||||
selector:
|
|
||||||
app.kubernetes.io/name: code-queue
|
|
||||||
unidesk.ai/instance-id: D518
|
|
||||||
ports:
|
|
||||||
- name: http
|
|
||||||
port: 4222
|
|
||||||
targetPort: http
|
|
||||||
|
|||||||
Reference in New Issue
Block a user