feat: add devops-controlled dev ci flow
This commit is contained in:
@@ -1,5 +1,7 @@
|
||||
# D601 k3s 开发环境建设计划
|
||||
|
||||
> 状态:本过程计划的环境分支方案已废弃。长期权威规则以 `docs/reference/deploy.md` 和 `docs/reference/ci.md` 为准:dev/prod 期望状态统一写在 `master` 分支的根目录 `deploy.json`,通过 `environments.dev` 和 `environments.prod` 区分,不再以 `deploy/dev` 或 `deploy/prod` 分支作为环境事实源。本文件保留为阶段性历史计划,不作为新实现依据。
|
||||
|
||||
## 目标
|
||||
|
||||
在现有 D601 原生 k3s 集群内建设一套与生产隔离的 UniDesk 开发环境,让以 LLM 为主力的开发流程可以部署、破坏、重建和验证 backend-core、frontend、Code Queue 及其数据库依赖,而不打断生产主 server。
|
||||
|
||||
+21
-4
@@ -1,6 +1,6 @@
|
||||
# UniDesk CI On D601 k3s
|
||||
|
||||
UniDesk CI is hosted on the D601 native k3s cluster with Tekton Pipelines and Tekton Triggers. It is CI only. CD remains the existing `deploy.json` / `deploy apply` / `codex deploy <commit>` path, and no Tekton task may roll out production services.
|
||||
UniDesk CI is hosted on the D601 native k3s cluster with Tekton Pipelines and Tekton Triggers. It is CI only. CD remains separate from Tekton; D601 service deployment must go through the DevOps control plane, while maintenance-channel direct D601 apply is reserved for DevOps bootstrap/repair. No Tekton task may roll out production services.
|
||||
|
||||
## Components
|
||||
|
||||
@@ -8,9 +8,10 @@ UniDesk CI is hosted on the D601 native k3s cluster with Tekton Pipelines and Te
|
||||
- Tekton Triggers: `v0.34.0`.
|
||||
- UniDesk CI namespace: `unidesk-ci`.
|
||||
- Manifests: `src/components/microservices/k3sctl-adapter/k3s/ci/`.
|
||||
- CLI entry: `bun scripts/cli.ts ci install|status|run|logs`.
|
||||
- CLI entry: `bun scripts/cli.ts ci install|status|run|run-dev-e2e|logs`.
|
||||
- DevOps control service: `src/components/microservices/devops`, normally installed in `unidesk-ci` and reached through the k3s service-proxy path.
|
||||
|
||||
The CLI reaches D601 through the existing `k3sctl-adapter` Host SSH maintenance bridge and then runs native `KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl ...`. It does not require backend-core to be running and does not expose a new public port.
|
||||
Bootstrap and recovery may reach D601 through backend-core `/api/dispatch` with the existing `host.ssh` provider capability, then run native `KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl ...` on D601. That maintenance path is limited to DevOps bootstrap/repair and CI bootstrap checks; it must not deploy backend-core, frontend, Code Queue, Decision Center, k3sctl-adapter or other direct/managed microservices. Normal CI/CD control should move to `CLI -> backend-core -> k3sctl-adapter -> DevOps -> Kubernetes API/Tekton` after DevOps is healthy. No new public port is exposed.
|
||||
|
||||
## Pipeline Scope
|
||||
|
||||
@@ -22,7 +23,7 @@ Each commit CI run performs:
|
||||
- Temporary `code-queue-ci-read` Deployment and ClusterIP Service in `unidesk-ci`.
|
||||
- Code Queue read performance checks against the production PostgreSQL through `d601-tcp-egress-gateway`.
|
||||
|
||||
`ci install` also prewarms the D601 k3s containerd runtime with the Tekton entrypoint/workingdir helper images, `oven/bun:1-debian`, `alpine/git:2.45.2` and `unidesk-code-queue:d601`. Missing images are pulled through the node-local provider-gateway WS egress proxy and then imported into native k3s containerd with digests preserved, so PipelineRun pods do not hang on external registry pulls.
|
||||
`ci install` also prewarms the D601 k3s containerd runtime with the Tekton entrypoint/workingdir helper images, `oven/bun:1-debian`, `alpine/git:2.45.2` and `unidesk-code-queue:dev`. Missing images are pulled through the node-local provider-gateway WS egress proxy and then imported into native k3s containerd with digests preserved, so PipelineRun pods do not hang on external registry pulls.
|
||||
|
||||
Git clone and dependency downloads inside the repo check task use `d601-provider-egress-proxy.unidesk.svc.cluster.local:18789`; the NO_PROXY list keeps the in-cluster read service, D601 TCP egress gateway and any in-cluster CI Git mirror on the cluster network.
|
||||
|
||||
@@ -42,6 +43,16 @@ The temporary Code Queue service uses:
|
||||
|
||||
This means the CI service can read existing tasks, Trace summaries, Trace steps and Trace step details from the main database, but it must not schedule, mutate, notify, backfill or become deployment truth.
|
||||
|
||||
## Dev Namespace E2E
|
||||
|
||||
`ci run-dev-e2e` is the manual CI entry for the dev desired-state smoke flow. The CLI fetches `origin/master:deploy.json`, reads `environments.dev`, records the `origin/master` commit that supplied the manifest, then normally calls DevOps through the existing microservice proxy to create a Tekton `PipelineRun`. The Pipeline verifies that the in-cluster Git fetch sees the same master commit before it reads `deploy.json`.
|
||||
|
||||
`ci run-dev-e2e --direct` is reserved for CI bootstrap and recovery when DevOps is not healthy yet. It creates only the CI PipelineRun through the maintenance Host SSH path, does not deploy any direct/managed microservice, and must not become the normal CI control path.
|
||||
|
||||
The first CI stage creates a temporary namespace named `unidesk-ci-e2e-<run-id>`, stores the selected desired manifest in a ConfigMap, starts an in-namespace smoke target, calls its `/health` endpoint through the Kubernetes Service DNS name, verifies the dev service commit IDs carried into the target, and deletes the namespace unless `--keep-namespace` is set. This stage proves the manual trigger, master desired-state pinning, namespace lifecycle, in-cluster Service DNS and e2e result path without mutating `unidesk`, `unidesk-dev`, production PostgreSQL, or any production workload.
|
||||
|
||||
The current dev namespace e2e is a harness and smoke gate, not a full frontend/backend/code-queue stack rollout. Full-stack temporary namespace deployment can be added behind the same `run-dev-e2e` command after image build/import and per-run database bootstrap are promoted into CI.
|
||||
|
||||
## Performance Gate
|
||||
|
||||
The initial budgets live in `unidesk-ci/unidesk-ci-budgets`:
|
||||
@@ -74,6 +85,12 @@ Run CI manually for a commit:
|
||||
bun scripts/cli.ts ci run --revision <commit>
|
||||
```
|
||||
|
||||
Run the dev namespace e2e harness manually:
|
||||
|
||||
```bash
|
||||
bun scripts/cli.ts ci run-dev-e2e --wait-ms 600000
|
||||
```
|
||||
|
||||
Inspect a run:
|
||||
|
||||
```bash
|
||||
|
||||
@@ -22,10 +22,11 @@ UniDesk 的统一 CLI 入口是根目录 `scripts/cli.ts`,运行方式固定
|
||||
- `microservice list/status/health/diagnostics/tunnel-self-test/proxy` 通过 backend-core 内网 API 管理挂载在计算节点 Docker 或 k3s 控制面中的用户服务(底层命令名仍为 microservice);`health`、`diagnostics`、`tunnel-self-test` 和 `proxy` 会走真实 backend-core -> provider-gateway 或 k3sctl-adapter -> 节点服务链路,`proxy` 支持受控 JSON 请求体并对超大响应 body 默认输出有界预览,规则见 `docs/reference/microservices.md`。
|
||||
- `decision upload/list/show/health` 通过 backend-core 用户服务代理访问 D601 k3s Decision Center,用于上传会议记录/决议 Markdown、列出权威记录、查看详情和健康检查;它不得直连 D601 Service、NodePort 或 provider-gateway 业务 HTTP。
|
||||
- `decision diary import <markdown-file>` 将带 `# YYYY年M月D日`、`# YYYY-MM-DD` 或 `# YYYY/M/D` 标题的工作日志拆成每天一篇 Markdown 日记,按 `YYYY-MM/YYYY-MM-DD.md` 虚拟路径写入 Decision Center PostgreSQL;`decision diary list/months/show` 分别用于按月/日期查询、列出月份和查看单日正文。
|
||||
- `deploy check/plan/apply` 默认从根目录 `deploy.json` 读取服务 repo 与 commit 期望状态,join `config.json` 和现有 manifest 后使用 target-side build 单一路径校验或更新直管服务与 k3s 代管服务;`deploy plan --env dev|prod` 只从固定 Git ref 读取 manifest 并输出 dry-run 环境计划,不使用本地 dirty worktree;`deploy apply --env dev --service backend-core|frontend|code-queue` 可按 `origin/deploy/dev:deploy.json` 部署当前 D601 dev slice,`--env prod` apply 仍禁用;规则见 `docs/reference/deploy.md`。
|
||||
- `deploy check/plan/apply` 默认从根目录 `deploy.json` 读取服务 repo 与 commit 期望状态,join `config.json` 和现有 manifest 后使用 target-side build 单一路径校验或更新直管服务与 k3s 代管服务;`deploy plan --env dev|prod` 只从 `origin/master:deploy.json#environments.<env>` 读取 manifest 并输出 dry-run 环境计划,不使用本地 dirty worktree;维护通道直连 D601 apply 只允许 `--env dev --service devops` 做 DevOps 自举/修复,backend-core/frontend/code-queue 等 dev 服务必须经 DevOps 控制面部署,`--env prod` apply 仍禁用;规则见 `docs/reference/deploy.md`。
|
||||
- `dev-env validate [--manifest path] [--kubectl-dry-run]` 离线校验 D601 `unidesk-dev` namespace、dev PostgreSQL 底座和 dev workload manifest。默认检查 `src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-foundation.k8s.yaml`;也可显式校验 `src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml` 或 `src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-code-queue.k8s.yaml`。所有 namespaced 对象必须只落到 `unidesk-dev`,foundation manifest 必须包含 `postgres-dev` StatefulSet/Service、dev secret/config、迁移 Job 和 DB URL guard,core manifest 必须包含 `backend-core-dev`/`frontend-dev` Deployment/Service,Code Queue dev manifest 必须包含 `code-queue-scheduler-dev`、`code-queue-read-dev`、`code-queue-write-dev` 和 dev provider egress proxy。加 `--kubectl-dry-run` 时额外执行 `kubectl apply --dry-run=client --validate=false -f <manifest>`,仍不 apply 资源。
|
||||
- `dev-env prewarm-images [--image image] [--provider-id D601] [--no-pull] [--proxy-url URL] [--pull-timeout-ms N] [--dry-run]` 创建异步 job,通过 UniDesk SSH 维护桥在 D601 上把开发底座依赖镜像从 Docker 缓存导入原生 k3s containerd。默认镜像是 `postgres:16-alpine` 和 `rancher/mirrored-library-busybox:1.36.1`,用于避免 `postgres-dev` 与 local-path helper pod 卡在外部 registry 拉取。该命令固定验证 `/etc/rancher/k3s/k3s.yaml` 指向的 native k3s 上下文,并输出 `dev_env_containerd_image_ready=...` 作为成功判据;它不 apply manifest、不修改生产 `unidesk` namespace。
|
||||
- `codex deploy <commitId>` 是 Code Queue 兼容部署入口,会生成临时 desired manifest 并调用 `deploy apply --service code-queue` 的同一条 target-side build、k3s import、rollout 和 live commit 验证路径;详细规则见 `docs/reference/codex-deploy.md`。
|
||||
- `ci install|status|run|run-dev-e2e|logs` 管理 D601 原生 k3s 上的 Tekton CI。`run` 手动创建每 commit 检查和 Code Queue 只读性能门禁;`run-dev-e2e` 手动读取 `origin/master:deploy.json#environments.dev`,创建临时 `unidesk-ci-e2e-*` namespace,验证 dev desired manifest、临时 Service DNS 和 smoke e2e 结果,默认清理 namespace,不修改 `unidesk`、`unidesk-dev` 或生产数据库;规则见 `docs/reference/ci.md`。
|
||||
- `codex deploy <commitId>` 是旧 Code Queue 兼容部署入口,已禁用以防止维护通道直连 D601 部署 Code Queue;后续 Code Queue 部署必须经 DevOps 控制面,详细规则见 `docs/reference/codex-deploy.md`。
|
||||
- `codex submit [prompt] [--prompt-file path|--prompt-stdin] [--queue queueId] [--provider-id id] [--cwd path] [--model model] [--reasoning-effort effort] [--execution-mode mode] [--max-attempts N] [--reference-task-id id] [--dry-run]` 通过 backend-core 私有代理向稳定 `code-queue` 用户服务路径提交任务;prompt 必须且只能来自位置参数、文件或 stdin 之一,`--dry-run` 只返回结构化请求且不实际入队。提交确认和 dry-run 必须返回完整 prompt、字符数和 `truncated=false`,不能套用任务详情的预览截断策略,否则长任务 prompt 无法被人工验收。backend-core 默认把提交、队列 CRUD、已读状态、历史摘要和轻量 Trace 读取分流到主 server `code-queue-mgr`,由它写入主 PostgreSQL;D601 scheduler 只轮询并执行已入库任务。
|
||||
- `codex task <taskId>` 通过 Code Queue 私有代理按任务 ID 查询结构化执行摘要;默认只返回有界 prompt/response 预览、执行 Provider、工作目录、最后 assistant message、最近工具调用摘要、attempt、judge、错误、耗时和 trace 翻页提示,适合在新队列任务中引用历史 session 且避免噪声爆炸。该摘要读取默认由主 server `code-queue-mgr` 从 PostgreSQL 返回,不依赖 D601 `code-queue-read` Service 可用。
|
||||
- `codex task <taskId> --trace --tail|--from-start|--after-seq N|--before-seq N --limit N` 按页拉取 Code Queue 的逻辑 trace;响应会返回 `nextAfterSeq`、`previousBeforeSeq`、`hasMore`、`hasBefore` 和下一页/上一页命令,默认 `--trace` 取最新一页,需要完整 prompt/最后 response 时加 `--full`。
|
||||
@@ -42,9 +43,9 @@ UniDesk 的统一 CLI 入口是根目录 `scripts/cli.ts`,运行方式固定
|
||||
|
||||
长时操作采用 Fire-and-Forget 模式:CLI 创建 `.state/jobs/{jobId}.json`,后台进程执行真实命令,并将 stdout、stderr 分别写入 `.state/jobs/{jobId}.stdout.log` 与 `.state/jobs/{jobId}.stderr.log`。调用者通过 `bun scripts/cli.ts job status <jobId>` 查询进度和尾部输出。
|
||||
|
||||
`server rebuild` 与 `server start`、`server stop` 一样必须通过返回的 job id 确认结果;不要把连续 `server rebuild` 命令理解成“前一个重建已完成”,因为两个命令只是在快速创建异步 job。重建 frontend 的标准流程是运行 `bun scripts/cli.ts server rebuild frontend`,随后轮询 `bun scripts/cli.ts job status <jobId>` 到 `succeeded`,再用 `server status` 或 `e2e run` 验证公网 frontend;重建 Todo Note 后端使用 `bun scripts/cli.ts server rebuild todo-note`,随后用 `microservice health todo-note` 和 `microservice proxy todo-note /api/instances` 验证;重建 Code Queue Manager 使用 `bun scripts/cli.ts server rebuild code-queue-mgr`,随后用 `microservice health code-queue-mgr`、`microservice health code-queue` 和 `codex submit --dry-run` 验证主 server 控制面路径;重建 Project Manager 后端使用 `bun scripts/cli.ts server rebuild project-manager`,随后用 `microservice health project-manager` 和 `microservice proxy project-manager /api/projects` 验证;重建 Baidu Netdisk 后端使用 `bun scripts/cli.ts server rebuild baidu-netdisk`,随后用 `microservice health baidu-netdisk` 和 `microservice proxy baidu-netdisk /api/transfers` 验证;重建 OA Event Flow 后端使用 `bun scripts/cli.ts server rebuild oa-event-flow`,随后用 `microservice health oa-event-flow` 和 `microservice proxy oa-event-flow /api/diagnostics` 验证。D601 Code Queue 执行面和 Decision Center 后端由 D601 k3s/k8s 控制面代管,必须使用 `bun scripts/cli.ts deploy apply --service code-queue`、`bun scripts/cli.ts deploy apply --service decision-center` 或 Code Queue 兼容入口 `bun scripts/cli.ts codex deploy <commitId>` 部署已 push 的 remote commit;部署 job 自身必须通过真实 `/health` 和 k3s Deployment annotation 证明不是旧服务在充数,之后再用 `microservice health <service>` 和对应私有代理 API 做人工复核。不得把 `docker rm` 手工兜底当成正式交付步骤。
|
||||
`server rebuild` 与 `server start`、`server stop` 一样必须通过返回的 job id 确认结果;不要把连续 `server rebuild` 命令理解成“前一个重建已完成”,因为两个命令只是在快速创建异步 job。重建 frontend 的标准流程是运行 `bun scripts/cli.ts server rebuild frontend`,随后轮询 `bun scripts/cli.ts job status <jobId>` 到 `succeeded`,再用 `server status` 或 `e2e run` 验证公网 frontend;重建 Todo Note 后端使用 `bun scripts/cli.ts server rebuild todo-note`,随后用 `microservice health todo-note` 和 `microservice proxy todo-note /api/instances` 验证;重建 Code Queue Manager 使用 `bun scripts/cli.ts server rebuild code-queue-mgr`,随后用 `microservice health code-queue-mgr`、`microservice health code-queue` 和 `codex submit --dry-run` 验证主 server 控制面路径;重建 Project Manager 后端使用 `bun scripts/cli.ts server rebuild project-manager`,随后用 `microservice health project-manager` 和 `microservice proxy project-manager /api/projects` 验证;重建 Baidu Netdisk 后端使用 `bun scripts/cli.ts server rebuild baidu-netdisk`,随后用 `microservice health baidu-netdisk` 和 `microservice proxy baidu-netdisk /api/transfers` 验证;重建 OA Event Flow 后端使用 `bun scripts/cli.ts server rebuild oa-event-flow`,随后用 `microservice health oa-event-flow` 和 `microservice proxy oa-event-flow /api/diagnostics` 验证。D601 Code Queue 执行面和 Decision Center 后端由 D601 k3s/k8s 控制面代管,但不得再通过维护通道直连 D601 做部署;除 DevOps 自举/修复外,D601 直管或代管微服务必须由 DevOps 控制面执行部署、rollout 和 live commit 验证。不得把 `docker rm` 手工兜底当成正式交付步骤。
|
||||
|
||||
新部署入口优先使用 `deploy apply`。旧的 `server rebuild` 和 `codex deploy` 只保留为兼容入口,后续实现应收敛到同一个 reconciler:从 remote commit 导出源码,在目标节点一次性代理构建镜像,部署后用 live commit 校验证明不是旧服务。
|
||||
新部署入口优先使用 `deploy apply`,但 D601 维护直连 apply 只服务 DevOps 自举/修复。旧的 `codex deploy` 已禁用;后续 Code Queue、Decision Center、backend-core dev、frontend dev 等 D601 服务部署应收敛到 DevOps 控制面:从 remote commit 导出源码,在目标节点一次性代理构建镜像,部署后用 live commit 校验证明不是旧服务。
|
||||
|
||||
## Output Contract
|
||||
|
||||
|
||||
@@ -1,25 +1,22 @@
|
||||
# Code Queue Deploy
|
||||
|
||||
`bun scripts/cli.ts codex deploy <commitId>` 是兼容入口。新的正式部署入口是 `bun scripts/cli.ts deploy apply --service code-queue`;兼容入口会生成一个只包含 Code Queue repo 与 commit 的临时 desired manifest,再调用同一个 deploy reconciler。命令只在主 server 工作区执行;它会立即返回异步 job id,后台 job 通过 backend-core 的 `host.ssh` dispatch 在 D601 完成实际部署。
|
||||
`bun scripts/cli.ts codex deploy <commitId>` 是旧兼容入口,现已禁用。原因是它会通过 backend-core `host.ssh` 维护通道直连 D601 部署 Code Queue,绕过 DevOps 控制面;维护通道直连 D601 现在只允许部署或修复 DevOps 本身。
|
||||
|
||||
Code Queue 后续正式部署必须由 DevOps 控制面执行:CLI 读取 `origin/master:deploy.json#environments.dev` 或生产 desired-state 后,经 backend-core、k3sctl-adapter 和 DevOps 触发 target-side build、k3s image import、rollout、stamp 和 live commit 验证。
|
||||
|
||||
## Command
|
||||
|
||||
```bash
|
||||
bun scripts/cli.ts deploy apply --service code-queue
|
||||
bun scripts/cli.ts codex deploy <commitId>
|
||||
bun scripts/cli.ts job status <jobId> --tail-bytes 30000
|
||||
```
|
||||
|
||||
- `commitId` 必须是已经 push 到 remote 的 7-40 位 hex commit SHA。
|
||||
- `--provider-id D601` 是默认值;当前部署路径只支持 D601 active instance。
|
||||
- `--timeout-ms N` 控制后台部署总超时,默认 `900000`。
|
||||
- `--skip-build` 不再支持;target-side Docker build 是强制步骤。
|
||||
该命令必须返回结构化错误,提示改用 DevOps 控制面;不得再创建后台部署 job。`--skip-build` 不再支持。
|
||||
|
||||
## Pipeline
|
||||
|
||||
部署 job 的步骤固定为:
|
||||
历史部署 job 曾固定为以下步骤;它们现在只能作为 DevOps 控制面实现 Code Queue CD 时的目标行为,不能由 `codex deploy` 或维护通道直连触发:
|
||||
|
||||
1. 对 Code Queue 部署先确保 PostgreSQL 中存在 `unidesk_deploy_ssh_identities(id='github.com')`,该记录保存 GitHub deploy SSH identity 的 private key、public key fingerprint 和 github.com `known_hosts` 行。`codex deploy` 会用主 server 当前 `/root/.ssh/id_ed25519` 种子化这条记录,然后通过 backend-core `/ws/ssh` 交互通道把 identity 流式分发到 D601 WSL `/home/ubuntu/.ssh/id_ed25519`、`id_ed25519.pub` 和 `known_hosts`,并在 D601 侧执行 `ssh -T git@github.com` 验证;secret 不得写入 `host.ssh` task payload、deploy 日志、Docker image 或 Kubernetes Secret。
|
||||
1. 对 Code Queue 部署先确保 PostgreSQL 中存在 `unidesk_deploy_ssh_identities(id='github.com')`,该记录保存 GitHub deploy SSH identity 的 private key、public key fingerprint 和 github.com `known_hosts` 行。DevOps 控制面不得把 secret 写入 task payload、deploy 日志、Docker image 或 Kubernetes Secret。
|
||||
2. 在 D601 的 deploy cache 中通过本机 provider-gateway WS egress proxy 执行 `git fetch` remote,并用 `git archive <commitId>` 导出 tracked files 到一次性 export 目录;不得让 D601 直连 GitHub,也不得临时创建 SSH SOCKS、公网 master proxy 或 backend-core/provider-ingress fallback。
|
||||
3. 用 `rsync --delete` 同步导出的 repo 到 `/home/ubuntu/cq-deploy`,保留 `.state/`、`logs/`、`.git/`、`node_modules/` 和 `dist/`。
|
||||
4. 在 D601 用目标 Docker daemon 的本地 BuildKit builder 构建 `unidesk-code-queue:d601`,复用 D601 上已有基础镜像、inline cache 和 Code Queue build-base;provider-gateway WS egress 是唯一允许的构建代理通道,只作为本次 build 的环境变量与 build-arg 注入,并配合本次 build 的 `--network host` 让 RUN 阶段访问 D601 宿主 loopback proxy,不能污染 D601 宿主 Docker/HTTP proxy 配置,不能新建 SSH SOCKS、公网 master proxy 或直连 fallback。
|
||||
@@ -31,9 +28,9 @@ bun scripts/cli.ts job status <jobId> --tail-bytes 30000
|
||||
|
||||
## Observability
|
||||
|
||||
`codex deploy` 本身不阻塞等待部署结束。返回 JSON 中的 `statusCommand` 和 `tailCommand` 是唯一状态入口。后台 job 的 stderr 是 JSONL progress,每个长步骤会记录远端 `/tmp/unidesk-deploy-*.log` 和 sentinel 文件;失败时 `job status` 会显示最后日志尾部。
|
||||
DevOps 控制面实现 Code Queue CD 后,部署触发本身不应阻塞等待完成。返回 JSON 中必须包含 run id、status command 或等价查询入口;后台日志必须有界可查,失败时能显示最后日志尾部。
|
||||
|
||||
`job status` 到 `succeeded` 时,部署 job 已经完成 live commit 验证。需要人工复核时可用以下命令确认 `deploy.commit`:
|
||||
部署 run 到 `succeeded` 时,必须已经完成 live commit 验证。需要人工复核时可用以下命令确认 `deploy.commit`:
|
||||
|
||||
```bash
|
||||
bun scripts/cli.ts microservice health code-queue
|
||||
@@ -44,7 +41,7 @@ D601 原生 k3s 的人工诊断必须显式使用 host kubeconfig:`KUBECONFIG=
|
||||
|
||||
## Boundaries
|
||||
|
||||
Code Queue 由 D601 k3s/k8s 控制面代管,不再通过 `server rebuild` 或手工 `docker compose up` 作为正式部署路径。`codex deploy` 可以在 Code Queue 自身正在执行任务时运行;服务重启后由 restart-recovery 恢复任务状态,不能等待当前 Code Queue task 退出后再部署。
|
||||
Code Queue 由 D601 k3s/k8s 控制面代管,不再通过 `server rebuild`、`codex deploy`、维护通道直连 D601 或手工 `docker compose up` 作为正式部署路径。Code Queue 部署必须在自身正在执行任务时仍可运行;服务重启后由 restart-recovery 恢复任务状态,不能等待当前 Code Queue task 退出后再部署。
|
||||
|
||||
## TCP Egress Gateway
|
||||
|
||||
|
||||
+57
-28
@@ -4,37 +4,66 @@ UniDesk deployment is driven by a desired-state manifest. The manifest answers o
|
||||
|
||||
## Manifest
|
||||
|
||||
The root `deploy.json` is intentionally minimal:
|
||||
The root `deploy.json` is the single desired-state source for both prod and dev. Environment branches such as `deploy/dev` and `deploy/prod` are deprecated because they create a second control plane for version intent.
|
||||
|
||||
```json
|
||||
{
|
||||
"schemaVersion": 1,
|
||||
"environment": "prod",
|
||||
"services": [
|
||||
{
|
||||
"id": "code-queue",
|
||||
"repo": "https://github.com/pikasTech/unidesk",
|
||||
"commitId": "0c3cdb4ee06a23361ed511a2da033d67b53d16f4"
|
||||
"schemaVersion": 2,
|
||||
"environments": {
|
||||
"prod": {
|
||||
"services": [
|
||||
{
|
||||
"id": "code-queue",
|
||||
"repo": "https://github.com/pikasTech/unidesk",
|
||||
"commitId": "0c3cdb4ee06a23361ed511a2da033d67b53d16f4"
|
||||
}
|
||||
]
|
||||
},
|
||||
"dev": {
|
||||
"services": [
|
||||
{
|
||||
"id": "backend-core",
|
||||
"repo": "https://github.com/pikasTech/unidesk",
|
||||
"commitId": "348c644"
|
||||
}
|
||||
]
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`environment` is optional only for the legacy local-file compatibility path. When present it must be exactly `dev` or `prod`. Any `--env <name>` command requires the manifest to declare the same `environment`; `--env dev` must reject `environment=prod`, and `--env prod` must reject `environment=dev`.
|
||||
`schemaVersion=1` remains accepted only as a local compatibility format. Standard environment commands use `schemaVersion=2` and select `environments.dev.services` or `environments.prod.services`.
|
||||
|
||||
`deploy.json` must not contain provider IDs, ports, compose service names, Kubernetes namespace, health paths, environment variables, Dockerfile paths or build commands. The deploy reconciler joins each `id` with `config.json.microservices[]` and existing k3s manifests to resolve those details. A service listed in `deploy.json` but missing from `config.json` is an error. A service with no Dockerfile source artifact is reported as unsupported rather than silently skipped. `commitId` may be a unique pushed short SHA or a full SHA; every deploy command resolves it through the remote repository to a full 40-character commit before target-side build or rollout, and fails immediately if the SHA is missing or ambiguous.
|
||||
|
||||
Environment mode never reads the local working tree manifest. The mapping is fixed:
|
||||
|
||||
- `dev -> origin/deploy/dev`
|
||||
- `prod -> origin/deploy/prod`
|
||||
|
||||
`deploy check --env ...` and `deploy plan --env ...` fetch the fixed ref, read `deploy.json` from that ref, validate the declared environment, and report the manifest commit/blob, service commit IDs, target namespace, database fingerprint and Provider identity without mutating runtime resources. `deploy apply --env dev` is enabled for the current isolated D601 dev slice: `backend-core`, `frontend` and `code-queue`. If no `--service` is given and the dev manifest still includes unsupported later-stage services such as `code-queue-mgr`, the command fails before changing runtime resources. `deploy apply --env prod` remains disabled until the production environment executor and authorization policy are explicitly added.
|
||||
|
||||
The `deploy/dev` and `deploy/prod` branches are environment desired-state branches, not source branches. They should contain only `deploy.json`; Kubernetes manifests, Dockerfiles and executor code continue to live on `master` and are selected through the commit IDs declared in the environment manifest.
|
||||
Environment mode never reads the local dirty working tree manifest. `deploy check --env ...`, `deploy plan --env ...` and `deploy apply --env ...` fetch `origin/master`, read `origin/master:deploy.json`, select `environments.<env>`, and report the manifest commit/blob, service commit IDs, target namespace, database fingerprint and Provider identity. Maintenance-channel direct D601 apply is intentionally narrow: only `deploy apply --env dev --service devops` may use that path, and only for DevOps bootstrap, repair or break-glass recovery. `deploy apply --env dev --service backend-core|frontend|code-queue` and local-manifest D601 service apply are rejected before runtime mutation; those services must be deployed by the DevOps control plane after it is healthy. `deploy apply --env prod` remains disabled until the production environment executor and authorization policy are explicitly added.
|
||||
|
||||
`config.json.microservices[].repository.commitId` is retained for catalog compatibility, but `deploy.json` is the deployment version authority for the reconciler.
|
||||
|
||||
## DevOps Bootstrap
|
||||
|
||||
DevOps has an intentional first-install bootstrap path to avoid a circular dependency where the service that should deploy CI/CD must already exist before it can deploy itself.
|
||||
|
||||
The only supported first-install shape is a one-shot D601-side script:
|
||||
|
||||
```bash
|
||||
tmp=$(mktemp) && curl -fsSL https://raw.githubusercontent.com/pikasTech/unidesk/master/scripts/bootstrap/devops-install.sh -o "$tmp" && sudo bash "$tmp" --commit <unidesk-commit-id> --env dev
|
||||
```
|
||||
|
||||
The bootstrapper may use D601 local shell, native SSH or provider-gateway Host SSH as a maintenance bridge, but only for DevOps bootstrap, repair and break-glass recovery. This maintenance bridge must not deploy backend-core, frontend, Code Queue, Decision Center, k3sctl-adapter or any other direct/managed microservice. It must run source fetch, Go build, Docker build, k3s image import and Kubernetes apply on D601. The main server must not compile Go/Rust or build DevOps images for D601.
|
||||
|
||||
The bootstrapper is deliberately narrow and idempotent:
|
||||
|
||||
- Verify D601 native k3s and `/etc/rancher/k3s/k3s.yaml`.
|
||||
- Clone or fetch the UniDesk repo on D601 and checkout the requested commit.
|
||||
- Build `src/components/microservices/devops/Dockerfile` on D601.
|
||||
- Import `unidesk-devops:dev` into native k3s containerd.
|
||||
- Apply `src/components/microservices/k3sctl-adapter/k3s/devops.k8s.yaml` into `unidesk-ci`.
|
||||
- Wait for `deployment/devops` rollout and `/health`.
|
||||
- Write a local bootstrap receipt with repo, requested commit, resolved commit, namespace, image and health result.
|
||||
|
||||
After DevOps is healthy, normal CI/CD control should move to `CLI -> backend-core -> k3sctl-adapter -> DevOps -> Kubernetes API/Tekton`. Host SSH remains a DevOps repair path, not a general CI/CD control plane and not a service deployment path.
|
||||
|
||||
## D601 Dev Foundation
|
||||
|
||||
Phase 2 of the D601 dev environment creates only the isolated namespace and database foundation. The authoritative manifest is `src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-foundation.k8s.yaml`.
|
||||
@@ -43,7 +72,7 @@ It may create resources only in `unidesk-dev`:
|
||||
|
||||
- `Namespace unidesk-dev`, plus quota and default limits.
|
||||
- `Secret unidesk-dev-runtime-secrets` as a dev-only template for DB credentials, provider token, auth/session secret, and Code Queue model secret placeholders.
|
||||
- `ConfigMap unidesk-dev-runtime-config` for dev identity, fixed deploy ref `origin/deploy/dev`, provider id `D601-dev`, Code Queue dev paths, and non-secret runtime defaults.
|
||||
- `ConfigMap unidesk-dev-runtime-config` for dev identity, desired-state source `origin/master:deploy.json#environments.dev`, provider id `D601-dev`, Code Queue dev paths, and non-secret runtime defaults.
|
||||
- `ConfigMap unidesk-dev-db-guard` with an executable guard script that rejects production-looking `DATABASE_URL` values.
|
||||
- `StatefulSet/Service postgres-dev` with a 5Gi persistent volume claim and bounded CPU/memory requests/limits.
|
||||
- `Job unidesk-dev-db-migrate`, which waits for `postgres-dev`, runs the guard, then prepares backend-core and Code Queue tables in the independent `unidesk_dev` database.
|
||||
@@ -62,7 +91,7 @@ Phase 3 introduces the dev backend/frontend manifest at `src/components/microser
|
||||
|
||||
`backend-core-dev` must use `unidesk-dev-runtime-config` and `unidesk-dev-runtime-secrets`, connect to `postgres-dev.../unidesk_dev`, expose HTTP on 8080 and provider ingress on 8081, and write logs under `/var/log/unidesk-dev`. `frontend-dev` must set `CORE_INTERNAL_URL=http://backend-core-dev.unidesk-dev.svc.cluster.local:8080` and must not proxy to production backend-core.
|
||||
|
||||
The manifest keeps placeholder image tags and deploy commit values in source control. `deploy apply --env dev --service backend-core|frontend` fetches `origin/deploy/dev:deploy.json`, materializes the requested source commit on D601, copies the dev core control manifest, narrows it to the selected Service/Deployment pair, replaces placeholders with the requested commit and dev image tag, builds on D601, imports the image into native k3s containerd, applies only the `unidesk-dev` objects and stamps the Deployment. Client dry-run and static validation are the required checks before any controlled apply:
|
||||
The manifest keeps placeholder image tags and deploy commit values in source control. Maintenance-channel direct D601 apply must not deploy `backend-core-dev` or `frontend-dev`; the CLI rejects `deploy apply --env dev --service backend-core|frontend` before runtime mutation. Dev core deployment must be implemented as a DevOps-controlled CD action that fetches `origin/master:deploy.json`, selects `environments.dev`, materializes the requested source commit on D601, narrows the dev core control manifest to the selected Service/Deployment pair, replaces placeholders with the requested commit and dev image tag, builds on D601, imports the image into native k3s containerd, applies only the `unidesk-dev` objects and stamps the Deployment. Client dry-run and static validation are the required checks before any controlled apply:
|
||||
|
||||
- `bun scripts/cli.ts dev-env validate --manifest src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`
|
||||
- `KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply --dry-run=client --validate=false -f src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`
|
||||
@@ -75,7 +104,7 @@ Phase 5 introduces the dev Code Queue execution manifest at `src/components/micr
|
||||
|
||||
All dev Code Queue components must use `unidesk-dev-runtime-config` and `unidesk-dev-runtime-secrets`, connect to `postgres-dev.../unidesk_dev`, write logs and state under `/home/ubuntu/unidesk-dev-code-queue-deploy/state`, and expose HTTP on 4222 only as ClusterIP services. The scheduler uses `CODE_QUEUE_MAIN_PROVIDER_ID=D601-dev`, `CODE_QUEUE_WORKDIR=/workspace-dev`, `CODE_QUEUE_REMOTE_WORKDIR=/home/ubuntu/unidesk-dev-workspace`, disables ClaudeQQ notifications by default, and does not use the production `d601-tcp-egress-gateway` or production PostgreSQL route.
|
||||
|
||||
`deploy apply --env dev --service code-queue` fetches `origin/deploy/dev:deploy.json`, materializes the requested source commit on D601, uses the dev Code Queue control manifest from that D601 materialized commit, narrows it to Code Queue dev objects, replaces placeholders with the requested commit and `unidesk-code-queue:dev`, builds on D601, imports the image into native k3s containerd, applies only `unidesk-dev` objects and stamps the dev Deployments. Because Code Queue carries the agent toolchain and browser/runtime dependencies, dev builds may reuse an existing D601 `unidesk-code-queue:d601-build-base` or `unidesk-code-queue:d601` image when the dev build-base tag is absent, and the deploy executor allows a longer Code Queue build window than lightweight services. The scheduler has an explicit 5Gi memory limit and must use `Recreate` rollout strategy so an update does not temporarily require two scheduler replicas under the namespace quota. All dev Code Queue containers must set CPU limits so the namespace `LimitRange` does not inject a quota-breaking default CPU limit. Live health verification uses the Kubernetes API service proxy for the dev ClusterIP Service, not `kubectl exec` or debug binaries inside the application image. This first dev execution slice proves deployability, health and dev database isolation; wiring the dev frontend stable `code-queue` route through a dev `code-queue-mgr` is a separate later phase.
|
||||
Maintenance-channel direct D601 apply must not deploy dev Code Queue; the CLI rejects `deploy apply --env dev --service code-queue` and the old `codex deploy` compatibility entry is disabled. Dev Code Queue deployment must be a DevOps-controlled CD action that fetches `origin/master:deploy.json`, selects `environments.dev`, materializes the requested source commit on D601, uses the dev Code Queue control manifest from that D601 materialized commit, narrows it to Code Queue dev objects, replaces placeholders with the requested commit and `unidesk-code-queue:dev`, builds on D601, imports the image into native k3s containerd, applies only `unidesk-dev` objects and stamps the dev Deployments. Because Code Queue carries the agent toolchain and browser/runtime dependencies, dev builds may reuse an existing D601 `unidesk-code-queue:d601-build-base` or `unidesk-code-queue:d601` image when the dev build-base tag is absent, and the deploy executor allows a longer Code Queue build window than lightweight services. The scheduler has an explicit 5Gi memory limit and must use `Recreate` rollout strategy so an update does not temporarily require two scheduler replicas under the namespace quota. All dev Code Queue containers must set CPU limits so the namespace `LimitRange` does not inject a quota-breaking default CPU limit. Live health verification uses the Kubernetes API service proxy for the dev ClusterIP Service, not `kubectl exec` or debug binaries inside the application image. This first dev execution slice proves deployability, health and dev database isolation; wiring the dev frontend stable `code-queue` route through a dev `code-queue-mgr` is a separate later phase.
|
||||
|
||||
## CLI
|
||||
|
||||
@@ -83,9 +112,9 @@ All dev Code Queue components must use `unidesk-dev-runtime-config` and `unidesk
|
||||
|
||||
`bun scripts/cli.ts deploy plan [--file deploy.json] [--service <id>]` prints the same live state plus the intended action: `noop`, `deploy` or `unsupported`.
|
||||
|
||||
`bun scripts/cli.ts deploy plan --env dev [--service <id>]` reads `origin/deploy/dev:deploy.json` and prints a dry-run environment plan without checking or mutating live runtime resources. `deploy check --env dev` uses the same dry-run environment plan. `--env prod` is available for parity as a dry-run planning path; it reads `origin/deploy/prod:deploy.json` and must not use a dirty local `deploy.json`.
|
||||
`bun scripts/cli.ts deploy plan --env dev [--service <id>]` reads `origin/master:deploy.json#environments.dev` and prints a dry-run environment plan without checking or mutating live runtime resources. `deploy check --env dev` uses the same dry-run environment plan. `--env prod` is available for parity as a dry-run planning path; it reads `origin/master:deploy.json#environments.prod` and must not use a dirty local `deploy.json`.
|
||||
|
||||
`bun scripts/cli.ts deploy apply [--file deploy.json | --env dev] [--service <id>] [--dry-run] [--force]` starts an asynchronous job. Use `bun scripts/cli.ts job status <jobId> --tail-bytes 30000` to observe progress. `--dry-run` resolves the same plan but does not build or replace runtime objects. `--force` rebuilds even when the live commit matches. Environment apply currently supports `--env dev --service backend-core`, `--env dev --service frontend` and `--env dev --service code-queue`; `--env prod` apply is rejected.
|
||||
`bun scripts/cli.ts deploy apply [--file deploy.json | --env dev] [--service <id>] [--dry-run] [--force]` starts an asynchronous job only for supported targets. Use `bun scripts/cli.ts job status <jobId> --tail-bytes 30000` to observe progress. `--dry-run` resolves the same plan but does not build or replace runtime objects. `--force` rebuilds even when the live commit matches. Environment apply currently supports only `--env dev --service devops` on the D601 maintenance direct path; `--env prod` apply is rejected, and D601 non-DevOps service apply is rejected before any runtime mutation.
|
||||
|
||||
All deploy commands output JSON. Long operations must use `.state/jobs/` and bounded log tails; no deploy path may succeed with missing progress output.
|
||||
|
||||
@@ -124,17 +153,17 @@ The reconciler selects the executor from `config.json`:
|
||||
|
||||
- `deployment.mode=unidesk-direct` on `main-server`: build the image on the main server, then use the fixed UniDesk Compose project and `up -d --no-build --no-deps --force-recreate <service>`.
|
||||
- `deployment.mode=internal-sidecar` on `main-server`: use the same main-server target-side source export, Docker build, image label stamping, fixed Compose project replacement and live commit verification as direct Compose services. This class is for private sidecars such as `code-queue-mgr`; it is still versioned by `deploy.json.commitId`, not by the operator's current worktree.
|
||||
- `deployment.mode=unidesk-direct` on a provider: dispatch `host.ssh` to that provider, build on the provider, then use the service's provider-local compose file and project. The executor resolves the actual Compose project, image name, build context, Dockerfile and target from the running container labels and `docker compose config`; it must not guess an image tag that the service will not actually run.
|
||||
- `deployment.mode=unidesk-direct` on a provider: this executor is disabled for D601 service deployment except for the explicit DevOps bootstrap/repair path. The historical behavior dispatched `host.ssh` to the provider, built on the provider, then used the service's provider-local compose file and project; that shape must move behind DevOps for D601 services so the maintenance bridge cannot become a second deployment control plane.
|
||||
- Control bridges that UniDesk needs in order to inspect or repair an orchestrator must stay in this direct class. In particular, `k3sctl-adapter` is a UniDesk-managed bridge to native k3s and must remain outside k3s; Docker packaging on Docker Desktop/WSL must create an explicit host-local bridge, currently an adapter-container SSH local tunnel, to reach `/etc/rancher/k3s/k3s.yaml` and WSL `127.0.0.1:6443`.
|
||||
- `deployment.mode=k3sctl-managed`: dispatch to the active control target, build on that target, verify or install native k3s on the host OS/WSL distro, import the image into native k3s/containerd, apply the existing Kubernetes manifest, stamp the Deployment and wait for rollout. The executor must use the native kubeconfig and containerd socket, for example `/etc/rancher/k3s/k3s.yaml` and `/run/k3s/containerd/containerd.sock`; running k3s itself in Docker is forbidden for both control-plane and worker nodes. A `rancher/k3s` image or legacy container may only be used as a temporary artifact source during migration, and any active containerized k3s control plane must be stopped before verification succeeds. The executor must preload a valid `rancher/mirrored-pause:3.6` sandbox image into native k3s containerd through the provider-gateway one-shot egress path, verify its entrypoint is `/pause`, and reject fake or sleep-based replacement images. Code Queue's k3s migration executor must also stop/remove the legacy direct Docker `code-queue-backend` after k3s rollout, so there is never a second scheduler running beside the native k3s scheduler.
|
||||
- `deployment.mode=k3sctl-managed`: the target behavior is to build on the active control target, verify native k3s on the host OS/WSL distro, import the image into native k3s/containerd, apply the existing Kubernetes manifest, stamp the Deployment and wait for rollout. On D601, maintenance-channel direct execution of this behavior is reserved for DevOps itself; other k3s managed services must be reconciled by DevOps after bootstrap. The executor must use the native kubeconfig and containerd socket, for example `/etc/rancher/k3s/k3s.yaml` and `/run/k3s/containerd/containerd.sock`; running k3s itself in Docker is forbidden for both control-plane and worker nodes. A `rancher/k3s` image or legacy container may only be used as a temporary artifact source during migration, and any active containerized k3s control plane must be stopped before verification succeeds. The executor must preload a valid `rancher/mirrored-pause:3.6` sandbox image into native k3s containerd through the provider-gateway one-shot egress path, verify its entrypoint is `/pause`, and reject fake or sleep-based replacement images. Code Queue's k3s migration executor must also stop/remove the legacy direct Docker `code-queue-backend` after k3s rollout, so there is never a second scheduler running beside the native k3s scheduler.
|
||||
|
||||
Existing service-specific commands such as Code Queue deploy should converge onto this reconciler path instead of keeping a parallel implementation.
|
||||
Existing service-specific commands such as Code Queue deploy are disabled as direct D601 deploy paths. Their build/import/rollout semantics should converge into DevOps-controlled CD instead of keeping a parallel implementation.
|
||||
|
||||
Decision Center is a standard `k3sctl-managed` service in this model. `deploy apply --service decision-center` must build `src/components/microservices/decision-center/Dockerfile` on D601, import `unidesk-decision-center:d601` into native k3s containerd, apply `src/components/microservices/k3sctl-adapter/k3s/decision-center.k8s.yaml`, stamp the Deployment, and verify health through `/api/microservices/decision-center/health`. It must not add a main-server Compose service, NodePort, hostPort, or provider-gateway direct HTTP backend for Decision Center.
|
||||
Decision Center is a standard `k3sctl-managed` service in this model, but D601 maintenance-channel direct apply must not deploy it. DevOps-controlled CD for Decision Center should build `src/components/microservices/decision-center/Dockerfile` on D601, import `unidesk-decision-center:d601` into native k3s containerd, apply `src/components/microservices/k3sctl-adapter/k3s/decision-center.k8s.yaml`, stamp the Deployment, and verify health through `/api/microservices/decision-center/health`. It must not add a main-server Compose service, NodePort, hostPort, or provider-gateway direct HTTP backend for Decision Center.
|
||||
|
||||
## CI Separation
|
||||
|
||||
Continuous integration is intentionally separate from this deploy reconciler. D601 k3s hosts Tekton CI resources described in `docs/reference/ci.md`, but those PipelineRuns only clone, check and run read-only performance gates. They must not call `deploy apply`, `codex deploy`, `kubectl rollout restart` for production services, or mutate `deploy.json`.
|
||||
Continuous integration is intentionally separate from this deploy reconciler. D601 k3s hosts Tekton CI resources described in `docs/reference/ci.md`, but those PipelineRuns only clone, check, run read-only performance gates, or create temporary CI-owned namespaces for dev manifest smoke e2e. They must not call `deploy apply`, `codex deploy`, `kubectl rollout restart` for production services, mutate `deploy.json`, or write production namespaces.
|
||||
|
||||
The Code Queue performance gate may create a temporary `code-queue-ci-read` service and read the main PostgreSQL through the existing `d601-tcp-egress-gateway`. Because it runs with `CODE_QUEUE_SERVICE_ROLE=read`, scheduler/backfill/notification disabled and EmptyDir state, it is not deployment truth and does not need a temporary database for the current read-only checks.
|
||||
|
||||
|
||||
@@ -27,7 +27,7 @@ CLI 会优先使用 `docker compose` v2 plugin;当 v2 plugin 不存在时才
|
||||
|
||||
Compose v2 安装后仍然必须遵守 UniDesk 的服务控制入口:全栈生命周期用 `server start` / `server stop`,单服务重建用 `server rebuild <service>`。不要因为 v2 可用就直接在生产栈上手工执行未纳入 CLI 的 `up --build`、`down -v` 或跨项目清理命令;所有会影响容器的动作都应保持 job 可观测、Compose project 固定、database named volume 保留。主 server Compose 命令必须从 `providerGateway.upgrade.hostProjectRoot` 指定的 canonical UniDesk 根目录运行,临时 worktree、Code Queue 导出目录或实验分支不得复用生产 `-p unidesk` 和固定 `container_name` 去替换生产容器。
|
||||
|
||||
版本化用户服务部署优先使用 `bun scripts/cli.ts deploy apply`。`deploy.json` 只声明服务 `id`、`repo` 和 `commitId`;目标节点、Dockerfile、Compose、Kubernetes manifest、健康检查和代理路径继续来自 `config.json` 与现有 manifest。主 server 直管微服务和内部 sidecar,例如 `code-queue-mgr`,也必须支持这一路径:`deploy apply --service code-queue-mgr` 从 `deploy.json` 指定 commit 导出源码、构建镜像、替换固定 Compose service 并验证运行中镜像/健康信息的 commit。部署必须遵循 target-side build:服务部署到哪台 target,就在哪台 target 从 remote commit 导出源码、一次性代理构建镜像并部署;不得把中心构建镜像作为默认分发路径,也不得用 `docker commit` 或脏 worktree 作为部署输入。完整规则见 `docs/reference/deploy.md`。
|
||||
版本化用户服务部署优先使用 `bun scripts/cli.ts deploy apply` 或 DevOps 控制面,但 D601 维护通道直连 apply 只允许部署或修复 DevOps 本身。`deploy.json` 只声明服务 `id`、`repo` 和 `commitId`;目标节点、Dockerfile、Compose、Kubernetes manifest、健康检查和代理路径继续来自 `config.json` 与现有 manifest。主 server 直管微服务和内部 sidecar,例如 `code-queue-mgr`,也必须支持这一路径:`deploy apply --service code-queue-mgr` 从 `deploy.json` 指定 commit 导出源码、构建镜像、替换固定 Compose service 并验证运行中镜像/健康信息的 commit。部署必须遵循 target-side build:服务部署到哪台 target,就在哪台 target 从 remote commit 导出源码、一次性代理构建镜像并部署;不得把中心构建镜像作为默认分发路径,也不得用 `docker commit` 或脏 worktree 作为部署输入。完整规则见 `docs/reference/deploy.md`。
|
||||
|
||||
## Main Server Swap
|
||||
|
||||
@@ -43,7 +43,7 @@ swap 管理不能被强塞进所有热路径。`server start/status` 可以暴
|
||||
|
||||
## Single Service Rebuild
|
||||
|
||||
前端、backend-core、本机 provider-gateway 或主 server 承载的 Todo Note/Code Queue Manager/Project Manager/Baidu Netdisk/OA Event Flow 用户服务需要非版本化本地重建时,统一使用 `bun scripts/cli.ts server rebuild <service>`,其中 `<service>` 只能是 `backend-core`、`frontend`、`provider-gateway`、`todo-note`、`code-queue-mgr`、`project-manager`、`baidu-netdisk` 或 `oa-event-flow`。需要按 commit 上线或恢复到 desired-state 时必须改用 `bun scripts/cli.ts deploy apply --service <id>`;直管微服务也不能把脏工作树或手工重建作为部署真相。D601 Code Queue 执行面、File Browser、FindJob、Pipeline、MET Nonlinear 和 ClaudeQQ 部署在计算节点,不属于主 server Compose 可重建服务;其中 D601 Code Queue 执行面的正式入口是 `bun scripts/cli.ts codex deploy <commitId>`。该命令先执行目标服务镜像构建,构建成功后才通过 `up -d --no-deps --force-recreate <service>` 替换目标容器,避免构建失败导致运行中的服务被提前停掉。
|
||||
前端、backend-core、本机 provider-gateway 或主 server 承载的 Todo Note/Code Queue Manager/Project Manager/Baidu Netdisk/OA Event Flow 用户服务需要非版本化本地重建时,统一使用 `bun scripts/cli.ts server rebuild <service>`,其中 `<service>` 只能是 `backend-core`、`frontend`、`provider-gateway`、`todo-note`、`code-queue-mgr`、`project-manager`、`baidu-netdisk` 或 `oa-event-flow`。需要按 commit 上线或恢复到 desired-state 时必须改用 `bun scripts/cli.ts deploy apply --service <id>`;直管微服务也不能把脏工作树或手工重建作为部署真相。D601 Code Queue 执行面、File Browser、FindJob、Pipeline、MET Nonlinear 和 ClaudeQQ 部署在计算节点,不属于主 server Compose 可重建服务;其中 D601 Code Queue 执行面不得再通过 `codex deploy` 或维护通道直连 D601 部署,必须经 DevOps 控制面执行 build-first、rollout 和 live commit 验证。
|
||||
|
||||
frontend 改动必须明确上线到公网:修改 `src/components/frontend/src/`、`src/components/frontend/public/style.css`、frontend 使用的共享 TSX/TS 模块或 WebUI 导航后,必须在同一变更集中执行 `bun scripts/cli.ts server rebuild frontend`,并等待 job 成功。公网 WebUI 的 `/app.js` 是 `unidesk-frontend` 容器启动时从镜像内源码转译生成的运行时 bundle;只改工作区文件、只跑 `bun run check`、只跑 `Bun.build` 或只刷新浏览器都不会替换已经运行的容器。
|
||||
|
||||
@@ -53,7 +53,7 @@ frontend 的 Docker 上线顺序为:先运行必要的本地校验,例如 `b
|
||||
|
||||
紧急灾备或数据迁移期间如需手工启动单个 Compose service,也必须保持与 CLI 相同的隔离语义:使用固定 `--env-file .state/docker-compose.env` 和 `up -d --no-deps <service>`,只启动目标容器;如果需要刷新 backend-core 的服务目录或环境变量,应把 `backend-core` 作为显式目标单独重建/替换,不能依赖 `up` 的依赖解析顺手重建 database、backend-core 或其他服务。
|
||||
|
||||
正式流程不得依赖人工 `docker rm` 兜底;手工删除旧容器后若 job、Docker client 或 daemon 在 `up` 前中断,会直接造成用户服务代理失败。`server rebuild <service>` 和 `codex deploy <commitId>` 都必须是可观测 job:build-first、受控替换、post-up validation、保留命名卷或 `.state` 运行态目录。Code Queue 等计算节点长任务服务即使被重建也必须依赖服务自身 restart-recovery 恢复任务,不能用“避免重建”掩盖恢复缺陷。
|
||||
正式流程不得依赖人工 `docker rm` 兜底;手工删除旧容器后若 job、Docker client 或 daemon 在 `up` 前中断,会直接造成用户服务代理失败。`server rebuild <service>`、`deploy apply` 和 DevOps 部署 run 都必须是可观测流程:build-first、受控替换、post-up validation、保留命名卷或 `.state` 运行态目录。Code Queue 等计算节点长任务服务即使被重建也必须依赖服务自身 restart-recovery 恢复任务,不能用“避免重建”掩盖恢复缺陷。
|
||||
|
||||
## User Service Restart Recovery
|
||||
|
||||
|
||||
@@ -165,9 +165,9 @@ D601 上必须显式使用原生 k3s kubeconfig:`KUBECONFIG=/etc/rancher/k3s/k
|
||||
|
||||
`backend-core-dev` 与 `frontend-dev` 的第一版 manifest 固定为 `src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`。该 manifest 只允许创建 `unidesk-dev` 内的 `backend-core-dev`、`frontend-dev` Deployment/Service;不得修改生产主 server Compose、生产 `unidesk` namespace 或生产 backend/frontend。
|
||||
|
||||
`backend-core-dev` 必须从 `unidesk-dev-runtime-config` 和 `unidesk-dev-runtime-secrets` 注入 dev-only 配置,使用 `postgres-dev.../unidesk_dev`、dev Provider token、dev log path 和 `UNIDESK_DEPLOY_REF=origin/deploy/dev`。`frontend-dev` 必须把 `CORE_INTERNAL_URL` 指向 `backend-core-dev.unidesk-dev.svc.cluster.local:8080`,页面在 dev identity 下显示 DEV 标记,`/health` 返回 dev namespace、database、service id、deploy ref 和 commit metadata。生产环境未设置 dev identity 时,backend-core 和 frontend health payload 保持生产兼容形状。
|
||||
`backend-core-dev` 必须从 `unidesk-dev-runtime-config` 和 `unidesk-dev-runtime-secrets` 注入 dev-only 配置,使用 `postgres-dev.../unidesk_dev`、dev Provider token、dev log path 和 `UNIDESK_DEPLOY_REF=origin/master:deploy.json#environments.dev`。`frontend-dev` 必须把 `CORE_INTERNAL_URL` 指向 `backend-core-dev.unidesk-dev.svc.cluster.local:8080`,页面在 dev identity 下显示 DEV 标记,`/health` 返回 dev namespace、database、service id、deploy ref 和 commit metadata。生产环境未设置 dev identity 时,backend-core 和 frontend health payload 保持生产兼容形状。
|
||||
|
||||
`unidesk-dev-core.k8s.yaml` 当前使用 placeholder image/commit;正式 rollout 需要后续 `deploy apply --env dev` executor 从 `origin/deploy/dev:deploy.json` 替换 commit 并构建镜像。当前验收只做静态校验和 Kubernetes client dry-run,不能把 placeholder manifest 当成已上线。
|
||||
`unidesk-dev-core.k8s.yaml` 当前使用 placeholder image/commit;正式 rollout 需要 `deploy apply --env dev` executor 从 `origin/master:deploy.json#environments.dev` 替换 commit 并构建镜像。当前验收只做静态校验和 Kubernetes client dry-run,不能把 placeholder manifest 当成已上线。
|
||||
|
||||
### Code Queue k3s-Managed
|
||||
|
||||
@@ -188,8 +188,8 @@ D601 上必须显式使用原生 k3s kubeconfig:`KUBECONFIG=/etc/rancher/k3s/k
|
||||
- MiniMax/OpenCode 并发:`minimax-m2.7` 通过 OpenCode JSON 事件端口运行;每个 Code Queue task 必须使用独立的 OpenCode XDG data/config/cache/state 目录,禁止多队列并发任务共享同一个 OpenCode SQLite/WAL 状态目录,否则并发 smoke 会触发 `PRAGMA journal_mode = WAL` 之类的数据库锁或初始化错误。用于验证 k3s/k8s 链路的 MiniMax smoke 以“至少 4 个任务、分布到 2 个 queue、至少 2 个终态成功”为链路验收线;剩余失败如果是 OpenCode 最终回复捕获、业务任务判定或模型限流,应作为 Code Queue 执行可靠性问题单独排查,不能反推 k3s 代理链路失败。
|
||||
- 默认出网代理:D601 active Code Queue Pod 必须默认把 `HTTP_PROXY`、`HTTPS_PROXY` 和 `ALL_PROXY` 注入给 Codex/OpenCode、`git`、`curl`、`npm` 等任务子进程;当前唯一上游是 D601 provider-gateway egress HTTP CONNECT 代理,并通过 Kubernetes `Service d601-provider-egress-proxy` 暴露给 `unidesk` namespace 内的 Pod。该 Service 通过 selector 指向 D601 上的 hostNetwork 桥接 Pod,桥接 Pod 在集群端监听 service port `18789`、在宿主侧只连接 `127.0.0.1:18789` 的 provider-gateway egress endpoint;不得再用手工 EndpointSlice、provider-gateway Docker bridge IP 或固定 `172.*` 地址作为长期拓扑。Pod 内代理 URL 使用 `http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789`,provider-gateway 宿主端口仍只允许绑定 `127.0.0.1`,不得开放公网;桥接 Pod 或 provider-gateway 重建后必须用 Code Queue `/health.egressProxy.connected=true` 验证。这里的 provider-gateway 只承担出网代理,不承担 Code Queue 业务 HTTP 代理;业务访问仍只能走 Kubernetes API service proxy。k3s/k8s 原生 egress gateway、service mesh 或 CNI egress policy 只作为后续网络层增强方向,当前交付态不引入第二套出网控制面。远程开发/执行容器不得只依赖这些环境变量,必须在容器网络层用 TUN 默认路由和 OUTPUT 防火墙强制外网流量只能经 master TUN 出口。
|
||||
- 出网代理无 fallback 纪律:Code Queue 的运行时配置只允许一个默认出网路径,即 provider-gateway egress proxy;不得在代码中同时保留 Code Queue 自建 WebSocket proxy、临时 shell proxy、D601 本地直连公网、主 server direct HTTP proxy 等隐式分支。任何新增网络 fallback 都必须先进入本参考文档并配套 `/health` 可见状态,否则视为残留旧路径。
|
||||
- 上线纪律:Code Queue 相关的前端或后端改进必须在同一任务内正式上线并验证公网 frontend 或 live API,不能只停留在源码、构建产物或“后续再上线”。修改 Code Queue 自身时不得等待当前 Code Queue task 结束、等待 queue idle 或等待 `0 running` 后才重启;D601 active 实例的正式后端部署入口是 `bun scripts/cli.ts codex deploy <commitId>`,它按已 push 的 remote commit 做 build-first 镜像替换、k3s image import、manifest apply、rollout 和健康验证,并用 k3s adapter、Code Queue live API 或公网 frontend 证明任务和队列仍可读可继续。
|
||||
- 期望状态部署:新的通用入口是 `bun scripts/cli.ts deploy apply --service code-queue`,它从 `deploy.json` 读取 repo 与 commit,再按 `docs/reference/deploy.md` 的 target-side build 规范在 D601 构建、导入 k3s、rollout 并验证 live commit。`codex deploy <commitId>` 是兼容入口,后续实现应复用同一个 reconciler,不得维护第二套部署语义。
|
||||
- 上线纪律:Code Queue 相关的前端或后端改进必须在同一任务内正式上线并验证公网 frontend 或 live API,不能只停留在源码、构建产物或“后续再上线”。修改 Code Queue 自身时不得等待当前 Code Queue task 结束、等待 queue idle 或等待 `0 running` 后才重启;D601 active 实例的后端部署必须经 DevOps 控制面执行 build-first 镜像替换、k3s image import、manifest apply、rollout 和健康验证,并用 k3s adapter、Code Queue live API 或公网 frontend 证明任务和队列仍可读可继续。
|
||||
- 期望状态部署:Code Queue 仍由 `deploy.json` 的 repo 与 commit 声明版本,但维护通道直连 D601 只允许部署 DevOps 本身,不能再用 `deploy apply --service code-queue` 或 `codex deploy <commitId>` 部署 Code Queue。DevOps 控制面实现 Code Queue CD 时,应复用 `docs/reference/deploy.md` 的 target-side build 规范在 D601 构建、导入 k3s、rollout 并验证 live commit,不得维护第二套部署语义。
|
||||
- 更名与灾备恢复:旧版 Codex 队列服务名只允许作为兼容诊断和一次性迁移来源;`code-queue-backend` 容器自身 `/health` 正常但 `microservice health code-queue` 返回 provider 直连错误时,优先判定为 backend-core 仍加载旧 `MICROSERVICES_JSON` 或 adapter manifest 未刷新,必须刷新 `.state/docker-compose.env`、重建/替换 `backend-core` 与 `k3sctl-adapter`,随后用 `microservice list` 验证 `code-queue` 的 `runtime.orchestrator=k3sctl`、`backend.proxyMode=k3sctl-adapter-http` 和无业务容器直连摘要。正式 k3s 部署成功后,旧 direct Docker `code-queue-backend` 必须停止并移除,不能与 `code-queue-scheduler` 同时运行;否则会形成双 scheduler、双健康来源和错误的恢复判断。
|
||||
- Codex 认证:容器必须从 D601 的 `/home/ubuntu/.codex/config.toml` 同步 Codex provider 配置到 D601 `.state/code-queue/codex-home/config.toml`,并只读挂载 `/home/ubuntu/.codex/auth.json` 到容器 `/root/.codex/auth.json` 后同步到 `.state/code-queue/codex-home/auth.json`,让 `codex app-server` 使用与 host 一致的 provider 登录态;同时通过 D601 `.state/code-queue-d601.env` 或 k8s `code-queue-env` secret 透传 `OPENAI_API_KEY`、`CRS_OAI_KEY` 等 provider 所需变量。这些 provider 环境变量和 auth 文件不得写入仓库,必须由 D601 运行时文件或 k8s secret 注入,确保容器重建和重启后不会丢失认证。新增 provider 的 `env_key` 时必须增加同类运行时透传和 Compose/k8s 持久化,禁止把 Codex 或 MiniMax 密钥写入仓库文件。Code Queue 容器必须只读挂载 D601 WSL host 的 SSH 目录到 `/root/.ssh`(默认 `/home/ubuntu/.ssh`),让容器内 `git push`、`ssh -T git@github.com` 与 WSL host 使用同一套 GitHub SSH key/known_hosts;不得把私钥复制进镜像或仓库。
|
||||
- Develop-ready 镜像:Code Queue 镜像必须在启动前预装 UniDesk/Pipeline 调试所需工具,至少包含 `codex`、`bun`、`node`、`npm`/`npx`、`git`、`rg`、`curl`、`python3`/`pip3`、`docker`、`docker compose`、`docker-compose`、`jq`、`ssh`、`rsync`、`make`、`gcc`/`g++`、`iptables`、`tar`、`gzip` 和 `unzip`;不得依赖 Codex 任务运行时再 `apt-get install` 这些基础环境。
|
||||
@@ -200,7 +200,7 @@ D601 上必须显式使用原生 k3s kubeconfig:`KUBECONFIG=/etc/rancher/k3s/k
|
||||
- Codex 控制:服务内部启动 `codex app-server --listen stdio://`,用 JSON-RPC 调用 `thread/start`、`turn/start`、`turn/steer` 和 `turn/interrupt`,并监听 `turn/completed`、assistant delta、reasoning delta、command output delta、file diff delta 等通知生成前端可轮询的 transcript。
|
||||
- 用户输入持久化:任务初始 prompt 以 `basePrompt/displayPrompt` 作为结构化来源,运行中追加的 `turn/steer` prompt 必须写入 `promptHistory`;transcript 构建时从这些结构化字段合成 `Submitted prompt` 和 `Steer prompt`,不能只依赖有 600 条上限的 raw output,否则长任务输出增长后会丢失关键人工指令。
|
||||
- 队列语义:`POST /api/tasks` 或 `/api/tasks/batch` 入队,服务始终只运行一个 Codex turn;当前任务真正终止后才推进下一个任务。`GET /api/tasks` 与 `GET /api/tasks/{id}` 返回队列、attempt、judge 和输出;`GET /api/tasks/{id}/summary` 返回按任务 ID 查询的结构化摘要,包括初始 prompt、最后 assistant message、工具调用摘要、attempt、judge、错误和耗时;CLI 入口是 `bun scripts/cli.ts codex task <taskId>`。`GET|POST /api/tasks/{id}/judge?attempt=N` 与 CLI `bun scripts/cli.ts codex judge <taskId> --attempt N` 用于单步复现指定 attempt 的 judge,必须复用真实队列 worker 的上下文构建、prompt 压缩、MiniMax 调用、JSON 去噪/repair 和 fallback 路径;`dryRun=1`/`--dry-run` 只输出 prompt/payload 和重建诊断,不调用 MiniMax。`POST /api/tasks/{id}/steer` 向运行中 turn 推入 prompt;`POST /api/tasks/{id}/interrupt` 或 `DELETE /api/tasks/{id}` 打断/取消;`POST /api/tasks/{id}/retry` 手动重试。队列 worker 必须隔离单个 task 的异常,不能因为某个 app-server、数据库 claim、judge 异常、judge 超时或 judge 判定 `fail` 让后续 queued 任务停止;`fail` 只把当前任务标为 failed,随后必须继续扫描并推进下一个 queued/retry_wait 任务。数据库 claim 必须有硬超时且失败时释放 active run slot;judge 必须有独立 watchdog,超时后走 fallback judge 并继续推进。当存在 queued/retry_wait 且 worker 空闲时,watchdog 必须自动重新调度。
|
||||
- 稳定性与重启恢复:Code Queue 的第一目标是长期稳定可用;部署修复或运维排障时不得因为担心容器重启会打断任务而拒绝重启、重建或替换 active Pod。容器重启、服务进程重启和镜像替换后,队列、`promptHistory`、running/judging/retry_wait 任务和 active session 元数据必须从 PostgreSQL 恢复,并在已有 `codexThreadId` 可用时用 `thread/resume` 和 continuation prompt 无缝继续当前任务;如果原 app-server turn 已丢失,也必须把当前任务恢复到可 retry/continue 的状态,不能错误推进下一个任务或永久卡住。D601 侧重建必须走 `bun scripts/cli.ts codex deploy <commitId>`;禁止先手工 `docker rm` 或只手工 `docker compose up` 再依赖后续命令补救,因为中断窗口会让 Pod/容器消失并触发 frontend/core 用户服务代理失败。重启后出现 active task 丢失、手动 steer/interrupt 记录丢失、running 任务卡死、误判完成、跳过当前任务、容器消失或阻塞队列,均属于 Code Queue 的 P0 核心缺陷,必须先修复并补充 restart-recovery 验收,不能把“避免重启”作为交付策略。
|
||||
- 稳定性与重启恢复:Code Queue 的第一目标是长期稳定可用;部署修复或运维排障时不得因为担心容器重启会打断任务而拒绝重启、重建或替换 active Pod。容器重启、服务进程重启和镜像替换后,队列、`promptHistory`、running/judging/retry_wait 任务和 active session 元数据必须从 PostgreSQL 恢复,并在已有 `codexThreadId` 可用时用 `thread/resume` 和 continuation prompt 无缝继续当前任务;如果原 app-server turn 已丢失,也必须把当前任务恢复到可 retry/continue 的状态,不能错误推进下一个任务或永久卡住。D601 侧重建必须走 DevOps 控制面;禁止先手工 `docker rm`、只手工 `docker compose up` 或用维护通道直连 D601 部署 Code Queue 再依赖后续命令补救,因为中断窗口会让 Pod/容器消失并触发 frontend/core 用户服务代理失败。重启后出现 active task 丢失、手动 steer/interrupt 记录丢失、running 任务卡死、误判完成、跳过当前任务、容器消失或阻塞队列,均属于 Code Queue 的 P0 核心缺陷,必须先修复并补充 restart-recovery 验收,不能把“避免重启”作为交付策略。
|
||||
- 调度与 active run slot:Code Queue 必须把“queue processor 正在等待/退避/轮询”和“实际占用 Codex/OpenCode 子进程运行槽”分开建模;`CODE_QUEUE_MAX_ACTIVE_QUEUES` 只限制真实 active run slot,不能把 retry backoff、等待内存下降或等待前序任务的 `processingQueues` 计入 active slot,否则设置全局 active slot 上限时,一个空等队列会把其他 runnable queue 永久饿死。多个 queue 同时等待 active slot 时必须显式维护 FIFO waiter 队列,避免某个长 retry/backoff 队列刚释放 slot 就立刻重抢,导致更早进入等待的 `retry_wait` 任务长期饥饿;`/health` 必须同时暴露真实 `activeQueueIds`、`activeRunSlotCount`、等待中的 `processingQueueIds` 和 active slot waiters,排障时以 active run slot 与 waiter 顺序判断是否真的有任务在跑、谁应下一个启动。restart-recovery 后的 `retry_wait` 任务若缺失 `codexThreadId`/OpenCode session id,不得无限拒绝 retry;必须用紧凑 recovery prompt 和原始任务摘要重新开一个 agent thread/session,让任务继续推进并在 Trace 中留下 recovery 证据。任何修改 scheduler、retry backoff、queue move、manual retry、shutdown recovery 或内存等待逻辑时,都必须保留“空等 processor 不占 active run slot”、“等待者 FIFO 不饥饿”和“缺失 thread/session 可恢复”的自测或 live 验证。
|
||||
- 内存优化过程与防回归:Code Queue 已迁移到 D601,但内存治理仍必须按“PostgreSQL 权威源优先、进程热状态最小化、容器硬上限兜底”的顺序设计。长期可复用的优化路径是:先确认任务、queue、readAt、promptHistory、active session 和通知 outbox 均可从 PostgreSQL 恢复;再把历史任务列表、详情、统计、Trace/output 和 `/health` 的只读查询改为 PostgreSQL 直读或聚合查询;随后只把 `queued`、`running`、`judging`、`retry_wait` 等调度必需任务载入 Bun 堆,并在 PostgreSQL 查询侧裁剪 hot `output`/`events`;最后用 dirty-only flush、append-only 输出归档、Codex SQLite 小批量导出、`bun --smol`、`mem_limit=600m`、`memswap_limit=1536m`、`NODE_OPTIONS=--max-old-space-size=768` 和 cgroup memory watchdog 作为运行时防线。PostgreSQL 到进程的单次读取足够快,不能为了减少 SQL 查询把全部历史 `task_json`、Trace、output 或统计摘要常驻内存;任何新增缓存都必须有默认较小的环境变量上限、明确淘汰策略、可从 PostgreSQL 或 append-only 归档重建,且不得影响重启恢复。新增或修改 `/api/tasks`、overview、stats、summary、transcript、output、trace、health、flush、scheduler 和通知路径时,禁止在常规请求中调用会物化全量历史任务 JSON 的代码,禁止启动后无条件重写全量历史 task JSON,禁止用未设上限的 `Map`/数组保存历史 output/event/Trace,`CODE_QUEUE_MAX_ACTIVE_QUEUES=0` 表示不按 queue 数量设置全局排队上限;如显式设置为正数,必须同时说明内存预算并补充内存压测验收。memory watchdog 必须以 cgroup working set 为主要判断,且在 swap 仍有余量时不得提前杀掉唯一 active run;否则 TypeScript/Playwright 这类短时高内存验证会被错误中断并让 retry 队列反复震荡。
|
||||
- 列表/详情延迟优化原则:Code Queue 控制面交互的长期目标是常规历史规模下首屏、`GET /api/tasks/overview`、`POST /api/tasks/<id>/read` 和分页加载均在 1s 内完成;性能面板出现十几秒级 `core_proxy` 或 Code Queue 用户服务代理慢操作时,必须优先按后端查询形态和前后端通信策略定位,不能把问题归因于 React 渲染后只改 UI。后端优化顺序是:先为 queue、status、updated/created 时间、readAt/terminal unread 和常用筛选条件补齐 PostgreSQL 索引;再用 SQL `COUNT`、`GROUP BY`、条件聚合和分页 ID 查询生成 queue/status/stats/unread 摘要;随后按 ID 轻量加载当前页、selected、active 和 unread priority task,禁止为了列表或已读操作解析完整 Trace、output archive、Codex transcript 或物化全量历史 `task_json`。`read`/`read-all` 这类 mutation 必须是 SQL-only 更新并返回最小 patch/queue 计数,不能触发 overview 全量重算或重载所有任务;启动 warm 只能预热小体积聚合和索引路径,不得把历史任务作为常驻缓存。允许 frontend/backend 代理使用秒级、严格有界、mutation 自动失效的 overview micro-cache 来吸收重复刷新,但 cache 只能作为抖动保护,不能替代数据库索引、聚合查询和分页披露,也不能让 stale readAt/queue/status 状态跨设备可见。
|
||||
|
||||
@@ -18,7 +18,7 @@
|
||||
- command.ts (Bounded command execution helpers)
|
||||
- output.ts (JSON output helpers)
|
||||
- e2e.ts (Public frontend/provider ingress, internal core/database, and Playwright frontend E2E checks)
|
||||
- ci.ts (D601 k3s Tekton CI install/status/manual-run/logs helpers; CI only, no CD)
|
||||
- ci.ts (D601 k3s Tekton CI install/status/manual-run/dev-namespace-e2e/logs helpers; CI only, no CD)
|
||||
- logs/ (Generated service logs; ignored by git)
|
||||
- .state/ (Generated job state and compose env; ignored by git)
|
||||
- docs/
|
||||
@@ -33,7 +33,7 @@
|
||||
- provider-gateway.md (Provider connection and host SSH maintenance bridge)
|
||||
- observability.md (Logs and status visibility)
|
||||
- e2e.md (Delivery gate, Playwright frontend E2E, and database persistence checks)
|
||||
- ci.md (D601 k3s Tekton CI, read-only production database performance gate, and trigger boundary)
|
||||
- ci.md (D601 k3s Tekton CI, read-only production database performance gate, master deploy.json dev namespace e2e harness, and trigger boundary)
|
||||
- src/ (TypeScript component monorepo)
|
||||
- package.json (Component workspace metadata)
|
||||
- bun.lock (Component dependency lockfile)
|
||||
|
||||
Reference in New Issue
Block a user