From 5bb59e591b7739c5baac77efd253fbc224bdb75c Mon Sep 17 00:00:00 2001 From: Codex Date: Sun, 17 May 2026 12:37:10 +0000 Subject: [PATCH] docs: translate d601 dev environment plan --- docs/plan/d601-k3s-dev-environment.md | 428 +++++++++++++------------- 1 file changed, 214 insertions(+), 214 deletions(-) diff --git a/docs/plan/d601-k3s-dev-environment.md b/docs/plan/d601-k3s-dev-environment.md index 61d6ccbb..de35fdd1 100644 --- a/docs/plan/d601-k3s-dev-environment.md +++ b/docs/plan/d601-k3s-dev-environment.md @@ -1,57 +1,57 @@ -# D601 k3s Development Environment Plan +# D601 k3s 开发环境建设计划 -## Goal +## 目标 -Build an isolated UniDesk development environment inside the existing D601 native k3s cluster so LLM-driven development can deploy, break, rebuild, and validate backend-core, frontend, Code Queue, and their database dependencies without interrupting the production main server. +在现有 D601 原生 k3s 集群内建设一套与生产隔离的 UniDesk 开发环境,让以 LLM 为主力的开发流程可以部署、破坏、重建和验证 backend-core、frontend、Code Queue 及其数据库依赖,而不打断生产主 server。 -The first version must support deployment by GitHub commit id through environment deploy manifests. The desired long-term control point is GitHub-hosted `deploy.json`: deploying an environment reads the `deploy.json` stored on the matching GitHub environment branch and applies the commit ids declared there. +第一版必须支持通过 GitHub commit id 部署。长期控制点是 GitHub 托管的 `deploy.json`:部署某个环境时,自动读取对应 GitHub 环境分支里的 `deploy.json`,并应用其中声明的 commit id。 -Initial environment branches: +初始环境分支: -- `deploy/dev`: desired state for the D601 k3s development environment. -- `deploy/prod`: desired state for production. Branch protection can be added later; the first implementation must still keep prod deployment commands and credentials separate from dev. +- `deploy/dev`:D601 k3s 开发环境的期望状态。 +- `deploy/prod`:生产环境的期望状态。分支保护可以后续补上;第一版仍必须把 prod 部署命令和凭据与 dev 隔离。 -## Non-Goals +## 非目标 -- Do not create a second physical k3s control plane in the first version. Use the existing D601 native k3s cluster with namespace-level isolation. -- Do not move production main server backend-core/frontend into k3s in the first version. -- Do not let the dev environment share production PostgreSQL tables, provider identity, provider token, Code Queue task state, or deployment worktree paths. -- Do not make `deploy/dev` or `deploy/prod` aliases for normal source branches. They are environment desired-state branches. +- 第一版不创建第二套物理 k3s 控制平面。先复用现有 D601 原生 k3s 集群,通过 namespace 做隔离。 +- 第一版不把生产主 server 的 backend-core/frontend 迁入 k3s。 +- dev 环境不得共享生产 PostgreSQL 表、Provider 身份、Provider token、Code Queue 任务状态或部署 worktree 路径。 +- `deploy/dev` 和 `deploy/prod` 不是普通源码开发分支的别名,只是环境期望状态分支。 -## Target Dev Topology +## 目标开发拓扑 -The first dev environment runs in namespace `unidesk-dev` on D601: +第一版 dev 环境运行在 D601 的 `unidesk-dev` namespace: -- `postgres-dev`: independent PostgreSQL StatefulSet or equivalent persistent database for dev. -- `backend-core-dev`: backend-core built from the commit id declared in `deploy/dev:deploy.json`. -- `frontend-dev`: frontend built from the commit id declared in `deploy/dev:deploy.json`, proxying only to `backend-core-dev`. -- `code-queue-mgr-dev`: lightweight Code Queue control plane using the dev database. -- `code-queue-read-dev`, `code-queue-write-dev`, `code-queue-scheduler-dev`: Code Queue k3s execution components using dev database, dev logs, dev state paths, and dev Code Queue settings. -- Optional first-access path: SSH port-forward or a private D601-hosted ingress. Public exposure is not required for phase 1. +- `postgres-dev`:dev 独立 PostgreSQL StatefulSet,或等价的持久化数据库。 +- `backend-core-dev`:从 `deploy/dev:deploy.json` 声明的 commit id 构建。 +- `frontend-dev`:从 `deploy/dev:deploy.json` 声明的 commit id 构建,并且只代理到 `backend-core-dev`。 +- `code-queue-mgr-dev`:轻量 Code Queue 控制面,使用 dev 数据库。 +- `code-queue-read-dev`、`code-queue-write-dev`、`code-queue-scheduler-dev`:Code Queue k3s 执行组件,使用 dev 数据库、dev 日志、dev state 路径和 dev Code Queue 配置。 +- 第一阶段访问方式可选:SSH port-forward 或 D601 私有 ingress。阶段 1 不要求公网暴露。 -All dev services must report environment identity in `/health`: +所有 dev 服务的 `/health` 必须输出环境身份: - `environment=dev` -- namespace -- database name +- namespace 名称 +- 数据库名称 - service id -- GitHub repo and commit id -- deployment ref, expected to be `origin/deploy/dev` +- GitHub repo 和 commit id +- 部署 ref,预期为 `origin/deploy/dev` -## Core Isolation Rules +## 核心隔离规则 -1. Dev services must use `unidesk-dev` namespace only. -2. Dev services must use a dev PostgreSQL instance or database. They must not connect to production PostgreSQL. -3. Dev provider identity must be separate, for example `D601-dev`; it must not reuse production `D601` provider id or provider token. -4. Dev Code Queue tasks, queues, attempts, notifications, and trace state must not write production tables unless table names are explicitly namespaced and verified safe. The preferred first version is a separate dev database. -5. Dev manifests must not mount production deployment roots such as `/root/unidesk` on the main server or production D601 deployment paths unless the mount is read-only and explicitly needed for diagnostics. -6. Dev Code Queue must use dev work directories, dev log directories, and dev state directories. -7. Production deploy must not read a local dirty `deploy.json`; production deploy must read the production desired state from the configured GitHub environment ref. -8. LLM/Code Queue development tasks should only receive dev deploy credentials by default. +1. dev 服务只能使用 `unidesk-dev` namespace。 +2. dev 服务必须使用 dev PostgreSQL 实例或 dev 数据库,不能连接生产 PostgreSQL。 +3. dev Provider 身份必须独立,例如 `D601-dev`,不能复用生产 `D601` 的 provider id 或 provider token。 +4. dev Code Queue 的 task、queue、attempt、notification 和 trace state 不能写入生产表;除非表名已经显式 namespace 化并经过安全验证。第一版优先使用独立 dev 数据库。 +5. dev manifest 不得挂载生产部署根目录,例如主 server 的 `/root/unidesk` 或 D601 生产部署路径;除非只读挂载且确实用于诊断。 +6. dev Code Queue 必须使用 dev workdir、dev log dir 和 dev state dir。 +7. 生产部署不能读取本地 dirty `deploy.json`;必须从配置好的 GitHub 生产环境 ref 读取生产期望状态。 +8. LLM/Code Queue 开发任务默认只获得 dev 部署凭据。 -## Deploy Manifest Model +## 部署清单模型 -Use one schema for environment manifests: +环境 manifest 使用同一套 schema: ```json { @@ -82,264 +82,264 @@ Use one schema for environment manifests: } ``` -Environment-to-ref mapping must be fixed in code or canonical config: +环境到 ref 的映射必须固定在代码或权威配置中: -- `dev` maps to `origin/deploy/dev`. -- `prod` maps to `origin/deploy/prod`. +- `dev` 映射到 `origin/deploy/dev`。 +- `prod` 映射到 `origin/deploy/prod`。 -The deploy command should accept an environment, not an arbitrary branch for production. A debug or admin-only command may inspect arbitrary refs, but normal prod deployment must use the fixed mapping. +部署命令应接收环境名,而不是让生产部署任意指定分支。debug 或 admin-only 命令可以查看任意 ref,但普通 prod 部署必须走固定映射。 -## Phase 0: Design And Guardrails +## 阶段 0:设计与护栏 -Purpose: make the target behavior explicit before adding a second runtime. +目的:在加入第二套运行环境前,把目标行为说清楚。 -Implementation items: +实现项: -- Define the environment manifest schema and validation rules. -- Add `environment` to deploy manifests and reject mismatches. -- Define fixed environment mappings: `dev -> deploy/dev`, `prod -> deploy/prod`. -- Document target namespace, database, provider identity, and service ids for dev. -- Add CLI dry-run planning output that prints: - - selected environment +- 定义环境 manifest schema 和校验规则。 +- 给 deploy manifest 增加 `environment` 字段,并拒绝环境不匹配的 manifest。 +- 定义固定环境映射:`dev -> deploy/dev`,`prod -> deploy/prod`。 +- 记录 dev 的目标 namespace、数据库、Provider 身份和 service id。 +- 增加 CLI dry-run plan 输出: + - 选中的环境 - GitHub ref - - resolved manifest commit - - services and commit ids - - target namespace - - target database fingerprint - - target provider identity + - 解析到的 manifest commit + - services 和 commit id + - 目标 namespace +- 目标数据库指纹 + - 目标 Provider 身份 -Acceptance criteria: +验收标准: -- `deploy plan --env dev` can read and validate a dev manifest without mutating the cluster. -- `deploy plan --env prod` can read and validate a prod manifest without using the local worktree `deploy.json`. -- A manifest with `environment=prod` must be rejected for `--env dev`, and the reverse must also be rejected. +- `deploy plan --env dev` 可以读取并校验 dev manifest,且不修改集群。 +- `deploy plan --env prod` 可以读取并校验 prod manifest,且不使用本地 worktree 的 `deploy.json`。 +- `environment=prod` 的 manifest 必须被 `--env dev` 拒绝,反向也必须拒绝。 -## Phase 1: GitHub Environment Branch Deploy Source +## 阶段 1:GitHub 环境分支作为部署源 -Purpose: make GitHub desired-state refs the deploy source of truth. +目的:让 GitHub 里的环境期望状态 ref 成为部署真相源。 -Implementation items: +实现项: -- Create or initialize `deploy/dev` with a valid `deploy.json`. -- Create or initialize `deploy/prod` with a valid `deploy.json`. -- Add CLI support to fetch an environment ref and read `deploy.json` from that ref. -- Keep the existing local `deploy.json` path as a compatibility mode only for explicit local/admin workflows. -- Ensure commit ids listed by the manifest exist in their declared repos. -- Ensure dev/prod deploy does not depend on a dirty local working tree. +- 创建或初始化 `deploy/dev`,包含有效的 `deploy.json`。 +- 创建或初始化 `deploy/prod`,包含有效的 `deploy.json`。 +- 增加 CLI 支持:fetch 环境 ref,并从该 ref 读取 `deploy.json`。 +- 现有本地 `deploy.json` 只保留为显式本地/管理员流程的兼容模式。 +- 校验 manifest 中列出的 commit id 存在于声明的 repo。 +- 确保 dev/prod 部署不依赖本地 dirty working tree。 -Acceptance criteria: +验收标准: -- `deploy plan --env dev` reads `origin/deploy/dev:deploy.json`. -- `deploy plan --env prod` reads `origin/deploy/prod:deploy.json`. -- Changing local `deploy.json` does not affect `--env dev` or `--env prod`. -- The plan output includes the Git ref and manifest blob/commit used. +- `deploy plan --env dev` 读取 `origin/deploy/dev:deploy.json`。 +- `deploy plan --env prod` 读取 `origin/deploy/prod:deploy.json`。 +- 修改本地 `deploy.json` 不影响 `--env dev` 或 `--env prod`。 +- plan 输出包含实际使用的 Git ref 和 manifest blob/commit。 -## Phase 2: D601 Dev Namespace And Database +## 阶段 2:D601 开发 Namespace 与数据库 -Purpose: create the minimum isolated substrate for dev backend and Code Queue state. +目的:创建 dev backend 和 Code Queue state 的最小隔离底座。 -Implementation items: +实现项: -- Add a k8s manifest for namespace `unidesk-dev`. -- Add dev PostgreSQL StatefulSet/Service/PVC or an equivalent persistent DB. -- Add dev DB init and migration flow for backend-core and Code Queue tables. -- Add dev secrets/config: - - database credentials +- 增加 `unidesk-dev` namespace 的 k8s manifest。 +- 增加 dev PostgreSQL StatefulSet/Service/PVC,或等价持久化 DB。 +- 增加 dev DB init 和 migration 流程,用于 backend-core 和 Code Queue 表。 +- 增加 dev secrets/config: + - 数据库凭据 - provider token - auth/session secret - - Code Queue model secrets if needed -- Add resource requests/limits so dev DB cannot starve D601 production k3s workloads. + - 必要时的 Code Queue model secrets +- 增加资源 requests/limits,避免 dev DB 挤占 D601 生产 k3s workload。 -Technical decisions: +技术决策: -- Prefer a separate dev PostgreSQL instance over sharing production PostgreSQL with a different database name. It gives the clearest failure boundary. -- If a shared PostgreSQL server is temporarily used, the CLI and services must hard-check database name and connection target before startup. +- 优先使用独立 dev PostgreSQL 实例,而不是共享生产 PostgreSQL 的不同数据库名。独立实例的故障边界最清晰。 +- 如果临时使用共享 PostgreSQL server,CLI 和服务启动时必须硬校验数据库名和连接目标。 -Acceptance criteria: +验收标准: -- `kubectl -n unidesk-dev get pods,svc,pvc` shows the dev DB ready. -- Dev DB survives Pod restart. -- Dev services cannot accidentally connect to the production database URL without failing startup validation. +- `kubectl -n unidesk-dev get pods,svc,pvc` 显示 dev DB ready。 +- dev DB 在 Pod 重启后数据仍存在。 +- dev 服务误连生产数据库 URL 时,必须启动失败。 -## Phase 3: backend-core-dev And frontend-dev +## 阶段 3:backend-core-dev 与 frontend-dev -Purpose: make a usable UniDesk dev control surface independent from production main server Compose. +目的:做出一套可用的 UniDesk dev 控制面,不依赖生产主 server Compose。 -Implementation items: +实现项: -- Add k8s manifests for `backend-core-dev` and `frontend-dev`. -- Build images from the commit ids declared in `deploy/dev:deploy.json`. -- Inject dev-only config into backend-core: +- 增加 `backend-core-dev` 和 `frontend-dev` 的 k8s manifest。 +- 从 `deploy/dev:deploy.json` 声明的 commit id 构建镜像。 +- 向 backend-core 注入 dev-only config: - `UNIDESK_ENV=dev` - dev `MICROSERVICES_JSON` - dev database URL - dev provider token - dev log paths -- Inject frontend config so it proxies to `backend-core-dev`, not production backend-core. -- Add service health and readiness probes. -- Expose dev frontend through port-forward or a private dev ingress. +- 注入 frontend 配置,使其代理到 `backend-core-dev`,不能代理生产 backend-core。 +- 增加 service health 和 readiness probe。 +- 通过 port-forward 或私有 dev ingress 暴露 dev frontend。 -Technical decisions: +技术决策: -- First version can omit public exposure. Port-forward is acceptable while validating isolation. -- Dev frontend must have a visible DEV environment marker to avoid operator confusion. +- 第一版可以不做公网暴露。验证隔离时使用 port-forward 即可。 +- dev frontend 必须有明显 DEV 环境标记,避免操作员混淆。 -Acceptance criteria: +验收标准: -- Dev backend-core `/health` returns ok and includes `environment=dev`. -- Dev frontend `/health` returns ok and proxies only to dev backend-core. -- Production `bun scripts/cli.ts server status` remains healthy while dev backend/frontend are redeployed. -- Rebuilding dev backend/frontend does not touch main server Docker Compose containers. +- dev backend-core `/health` 返回 ok,并包含 `environment=dev`。 +- dev frontend `/health` 返回 ok,并且只代理到 dev backend-core。 +- dev backend/frontend 重部署期间,生产 `bun scripts/cli.ts server status` 仍健康。 +- 重建 dev backend/frontend 不触碰主 server Docker Compose 容器。 -## Phase 4: code-queue-mgr-dev +## 阶段 4:code-queue-mgr-dev -Purpose: provide the dev queue management and submission path without writing production Code Queue tables. +目的:提供 dev 队列管理和提交路径,同时不写生产 Code Queue 表。 -Implementation items: +实现项: -- Add k8s manifest for `code-queue-mgr-dev`. -- Configure it to use the dev database only. -- Configure dev backend-core service catalog so stable dev `code-queue` control/read paths route to `code-queue-mgr-dev`. -- Ensure `code-queue-mgr-dev` can submit, list, summarize, and update dev queue state. -- Add health output proving: - - role is master-control-plane or dev-control-plane - - database is dev - - schema is ready +- 增加 `code-queue-mgr-dev` k8s manifest。 +- 配置它只使用 dev 数据库。 +- 配置 dev backend-core service catalog,使 dev 稳定 `code-queue` 控制/读取路径路由到 `code-queue-mgr-dev`。 +- 确保 `code-queue-mgr-dev` 可以提交、列出、汇总和更新 dev queue state。 +- health 输出必须证明: + - role 是 `master-control-plane` 或 `dev-control-plane` + - database 是 dev + - schema ready - no runner dependencies -Acceptance criteria: +验收标准: -- Dev UI/CLI can submit a dry-run or queued task to the dev DB. -- Production Code Queue task list is unchanged by dev submissions. -- Dev `code-queue-mgr-dev` memory footprint remains within the lightweight control-plane budget. +- dev UI/CLI 可以向 dev DB 提交 dry-run 或 queued task。 +- dev 提交不会改变生产 Code Queue task list。 +- `code-queue-mgr-dev` 内存占用保持在轻量控制面预算内。 -## Phase 5: code-queue-dev Execution Components +## 阶段 5:code-queue-dev 执行组件 -Purpose: run dev Code Queue execution inside `unidesk-dev` without interfering with production Code Queue. +目的:在 `unidesk-dev` 内运行 dev Code Queue 执行面,不干扰生产 Code Queue。 -Implementation items: +实现项: -- Add dev variants of Code Queue manifests: +- 增加 Code Queue manifest 的 dev 变体: - `code-queue-read-dev` - `code-queue-write-dev` - `code-queue-scheduler-dev` -- Configure all dev components to use dev database, dev logs, and dev state paths. -- Use dev service names and labels so production k3s adapter does not confuse dev and prod services. -- Decide whether first version supports real Codex execution or smoke-only execution. -- If real execution is enabled: - - isolate workdir paths - - isolate Codex/OpenCode XDG/state paths - - isolate notifications - - cap concurrency and memory - - avoid writing production OA Event Flow unless explicitly configured for dev +- 所有 dev 组件使用 dev 数据库、dev 日志和 dev state 路径。 +- 使用 dev service name 和 label,避免生产 k3s adapter 混淆 dev/prod 服务。 +- 决定第一版支持真实 Codex 执行,还是只支持 smoke-only 执行。 +- 如果启用真实执行: + - 隔离 workdir 路径 + - 隔离 Codex/OpenCode XDG/state 路径 + - 隔离通知 + - 限制并发和内存 + - 默认不写生产 OA Event Flow,除非显式配置为 dev -Technical decisions: +技术决策: -- First version should default to smoke/dry-run execution unless real task execution is needed immediately. -- If real task execution is enabled, use a dev-specific queue prefix or dev database and disable production ClaudeQQ notifications by default. +- 第一版默认应使用 smoke/dry-run 执行,除非立刻需要真实任务执行。 +- 如果启用真实任务执行,使用 dev 专用 queue prefix 或 dev 数据库,并默认禁用生产 ClaudeQQ 通知。 -Acceptance criteria: +验收标准: -- Dev Code Queue `/health` returns ok and includes `environment=dev`. -- Dev scheduler can pick up a dev queued task and move it through a terminal state. -- Restarting dev scheduler does not affect production running tasks. -- Production `code-queue` health remains healthy during dev Code Queue rollout. +- dev Code Queue `/health` 返回 ok,并包含 `environment=dev`。 +- dev scheduler 可以拾取 dev queued task,并推进到终态。 +- 重启 dev scheduler 不影响生产 running task。 +- dev Code Queue rollout 期间,生产 `code-queue` health 仍健康。 -## Phase 6: Dev Deploy Apply +## 阶段 6:Dev 部署执行 -Purpose: make `deploy/dev:deploy.json` drive the dev environment end to end. +目的:让 `deploy/dev:deploy.json` 端到端驱动 dev 环境。 -Implementation items: +实现项: -- Add `deploy apply --env dev`. -- For each service in the dev manifest: - - fetch declared repo and commit - - build image on D601 or through the established target-side build path - - tag image with environment and commit - - apply the dev k8s manifest - - wait for rollout - - verify live commit from `/health` or Deployment annotation -- Ensure deployment records include environment, ref, service id, commit id, image tag, namespace, and rollout status. -- Add `deploy status --env dev` or equivalent drift check. +- 增加 `deploy apply --env dev`。 +- 对 dev manifest 中的每个服务: + - fetch 声明的 repo 和 commit + - 在 D601 上构建镜像,或复用既有 target-side build 路径 + - 用环境和 commit 给镜像打 tag + - apply dev k8s manifest + - 等待 rollout + - 从 `/health` 或 Deployment annotation 验证 live commit +- 部署记录包含 environment、ref、service id、commit id、image tag、namespace 和 rollout 状态。 +- 增加 `deploy status --env dev` 或等价 drift check。 -Acceptance criteria: +验收标准: -- Updating `deploy/dev:deploy.json` to a new commit and running `deploy apply --env dev` updates dev backend-core/frontend/code-queue components. -- Live `/health` commit matches the manifest commit. -- No production Deployment, Service, Secret, PVC, DB table, or Docker Compose container is mutated by dev deploy. +- 更新 `deploy/dev:deploy.json` 到新 commit 后,运行 `deploy apply --env dev` 会更新 dev backend-core/frontend/code-queue 组件。 +- live `/health` commit 与 manifest commit 一致。 +- dev 部署不会修改任何生产 Deployment、Service、Secret、PVC、DB table 或 Docker Compose 容器。 -## Phase 7: Prod Deploy Ref Compatibility +## 阶段 7:生产部署 Ref 兼容 -Purpose: let production read desired state from `deploy/prod` while keeping production runtime unchanged. +目的:让生产部署从 `deploy/prod` 读取期望状态,同时保持生产运行方式不变。 -Implementation items: +实现项: -- Add `deploy plan --env prod` and `deploy apply --env prod` using `origin/deploy/prod:deploy.json`. -- Keep production target executors as they are initially: - - main server Compose for production backend-core/frontend and direct sidecars - - D601 k3s for production Code Queue execution -- Enforce production command guardrails: - - canonical root only - - production credentials only on main server - - manifest must say `environment=prod` - - target namespace and provider identity must match production -- Branch protection for `deploy/prod` is recommended but can be added after the first version. +- 增加 `deploy plan --env prod` 和 `deploy apply --env prod`,读取 `origin/deploy/prod:deploy.json`。 +- 初期保持现有生产 executor: + - 生产 backend-core/frontend 和 direct sidecar 继续使用主 server Compose。 + - 生产 Code Queue 执行面继续使用 D601 k3s。 +- 强化生产命令护栏: + - 只能在权威根目录执行 + - 生产凭据只存在主 server + - manifest 必须声明 `environment=prod` + - 目标 namespace 和 Provider 身份必须匹配生产 +- 建议后续给 `deploy/prod` 增加 branch protection;第一版可暂缓。 -Acceptance criteria: +验收标准: -- Production deploy no longer depends on local `deploy.json`. -- Production deploy reports the exact Git ref and manifest commit used. -- Production deploy still validates live commit after rollout. +- 生产部署不再依赖本地 `deploy.json`。 +- 生产部署报告实际使用的 Git ref 和 manifest commit。 +- 生产部署 rollout 后仍校验 live commit。 -## Phase 8: Operator And LLM Safety +## 阶段 8:操作员与 LLM 安全 -Purpose: reduce environment confusion for LLM agents and humans. +目的:降低 LLM agent 和人工操作员混淆环境的概率。 -Implementation items: +实现项: -- Add clear CLI output for every deploy: +- 每次 deploy 的 CLI 输出都清晰显示: - environment - ref - namespace - - DB fingerprint + - DB 指纹 - provider id - - services and commits -- Add explicit DEV marker in dev frontend. -- Add hard startup checks: - - dev service refuses production DB - - dev service refuses production provider id/token - - prod service refuses dev namespace/DB -- Ensure LLM task containers receive dev deploy credentials by default and do not receive prod credentials. -- Add smoke checks that intentionally try unsafe combinations and verify they fail. + - services 和 commits +- dev frontend 增加明确 DEV 标记。 +- 增加硬启动检查: + - dev service 拒绝生产 DB + - dev service 拒绝生产 provider id/token + - prod service 拒绝 dev namespace/DB +- 确保 LLM task container 默认只拿到 dev deploy 凭据,拿不到 prod 凭据。 +- 增加 smoke check:故意尝试不安全组合,并验证它们失败。 -Acceptance criteria: +验收标准: -- Running a dev service with production DB config fails before listening. -- Running prod deploy from a non-canonical context fails. -- LLM/Code Queue default environment can deploy dev but cannot deploy prod without the separate production credential path. +- dev 服务使用生产 DB 配置启动时,在监听端口前失败。 +- 从非权威上下文运行 prod deploy 时失败。 +- LLM/Code Queue 默认环境可以部署 dev,但没有独立生产凭据路径时不能部署 prod。 -## Risks And Mitigations +## 风险与缓解 -- Risk: namespace isolation does not isolate node-level CPU, memory, Docker socket, hostPath, or containerd load. - - Mitigation: resource requests/limits, separate dev workdirs, no production path mounts, and bounded Code Queue concurrency. -- Risk: dev Code Queue accidentally writes production task tables. - - Mitigation: separate dev DB, startup DB fingerprint checks, and health output showing DB identity. -- Risk: dev frontend appears to be prod or proxies to prod backend-core. - - Mitigation: visible DEV marker, `CORE_INTERNAL_URL` hardwired to dev service, and proxy target health checks. -- Risk: deploy command accidentally reads local manifest instead of GitHub environment ref. - - Mitigation: `--env` mode must read remote ref only and report the ref/blob used. -- Risk: D601 k3s control plane failure affects both dev and production k3s workloads. - - Mitigation: accept this in phase 1; consider a separate physical/node-level dev cluster only after namespace isolation proves insufficient. -- Risk: branch `deploy/prod` is initially unprotected. - - Mitigation: even before branch protection, production deploy should still require canonical main server credentials and should report the ref used for audit. +- 风险:namespace 不能隔离节点级 CPU、内存、Docker socket、hostPath 或 containerd 压力。 + - 缓解:资源 requests/limits、独立 dev workdir、不挂载生产路径,并限制 Code Queue 并发。 +- 风险:dev Code Queue 误写生产 task 表。 + - 缓解:独立 dev DB、启动时 DB 指纹检查、health 输出 DB 身份。 +- 风险:dev frontend 看起来像 prod,或代理到 prod backend-core。 + - 缓解:可见 DEV 标记、`CORE_INTERNAL_URL` 固定到 dev service、proxy target health check。 +- 风险:deploy 命令误读本地 manifest,而不是 GitHub 环境 ref。 + - 缓解:`--env` 模式必须只读取 remote ref,并报告实际使用的 ref/blob。 +- 风险:D601 k3s 控制面故障同时影响 dev 和生产 k3s workload。 + - 缓解:Phase 1 接受该风险;只有 namespace 隔离被证明不足后,再考虑独立物理节点或节点级 dev 集群。 +- 风险:`deploy/prod` 初期未开启 branch protection。 + - 缓解:即使没有 branch protection,生产部署仍必须要求权威主 server 凭据,并报告用于审计的 ref。 -## Suggested Implementation Order +## 建议实现顺序 -1. Phase 0 and Phase 1: establish GitHub environment branch desired-state and dry-run planning. -2. Phase 2 and Phase 3: create dev namespace, dev DB, backend-core-dev, and frontend-dev. -3. Phase 4 and Phase 5: add dev Code Queue control and execution components. -4. Phase 6: make `deploy apply --env dev` deploy the full first dev stack by commit id. -5. Phase 7: migrate production deploy to `deploy/prod`. -6. Phase 8: harden operator and LLM safety checks. +1. Phase 0 和 Phase 1:建立 GitHub 环境分支期望状态和 dry-run planning。 +2. Phase 2 和 Phase 3:创建 dev namespace、dev DB、backend-core-dev 和 frontend-dev。 +3. Phase 4 和 Phase 5:增加 dev Code Queue 控制面和执行组件。 +4. Phase 6:让 `deploy apply --env dev` 按 commit id 部署完整第一版 dev stack。 +5. Phase 7:把生产部署迁移到 `deploy/prod`。 +6. Phase 8:强化操作员和 LLM 安全检查。 -The first milestone is complete when `deploy apply --env dev` can deploy backend-core, frontend, code-queue-mgr, and Code Queue read/write/scheduler into `unidesk-dev` from commit ids declared in `origin/deploy/dev:deploy.json`, and repeated dev redeploys do not change production main server status or production Code Queue state. +第一个里程碑完成条件:`deploy apply --env dev` 可以根据 `origin/deploy/dev:deploy.json` 声明的 commit id,把 backend-core、frontend、code-queue-mgr 以及 Code Queue read/write/scheduler 部署进 `unidesk-dev`;反复 dev redeploy 不改变生产主 server status,也不改变生产 Code Queue state。