feat: extend k3s artifact consumers

2026-05-20 02:58:41 +00:00
parent 15a1a8b21a
commit 595de3d320
28 changed files with 1003 additions and 76 deletions
@@ -40,9 +40,9 @@ The root `deploy.json` is the single desired-state source for both prod and dev.

 The optional non-service execution declaration under `environments.dev` is intentionally not specified here. The only currently allowed declaration is `ci`, and its authoritative `repo`, `scriptPath`, `timeoutMs`, short launcher, host fetch boundary and no-CD rules are defined only in `docs/reference/dev-ci-runner.md`.

-Environment mode never reads the local dirty working tree manifest. `deploy check --env ...`, `deploy plan --env ...` and `deploy apply --env ...` fetch `origin/master`, read `origin/master:deploy.json`, select `environments.<env>`, and report the manifest commit/blob, service commit IDs, target namespace, database fingerprint and Provider identity. `deploy apply --env dev` is currently enabled for persistent D601 dev `backend-core` target-side rollout and for reviewed artifact consumers `frontend`, `baidu-netdisk`, `decision-center`, `project-manager`, `oa-event-flow`, `code-queue-mgr`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear`. `deploy apply --env prod` exposes reviewed registry artifact consumers (`backend-core`, `frontend`, `baidu-netdisk`, `decision-center`, `project-manager`, `oa-event-flow`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear`), while `code-queue-mgr` remains supervisor-gated and `k3sctl-adapter` is plan/dry-run only. Production backend-core artifact CD is a separate executor because its build target is D601 CI while its runtime target is the master server. The default user-service delivery policy, including CI build, registry publication, dev validation, production CD and manual acceptance, is documented in `docs/reference/user-service-delivery.md`.
+Environment mode never reads the local dirty working tree manifest. `deploy check --env ...`, `deploy plan --env ...` and `deploy apply --env ...` fetch `origin/master`, read `origin/master:deploy.json`, select `environments.<env>`, and report the manifest commit/blob, service commit IDs, target namespace, database fingerprint and Provider identity. `deploy apply --env dev` is currently enabled for persistent D601 dev `backend-core` target-side rollout and for reviewed artifact consumers `frontend`, `baidu-netdisk`, `decision-center`, `mdtodo`, `claudeqq`, dev-only `code-queue`, `project-manager`, `oa-event-flow`, `code-queue-mgr`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear`. `deploy apply --env prod` exposes reviewed registry artifact consumers (`backend-core`, `frontend`, `baidu-netdisk`, `decision-center`, `mdtodo`, `claudeqq`, `project-manager`, `oa-event-flow`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear`), while `code-queue` must report unsupported, `code-queue-mgr` remains supervisor-gated and `k3sctl-adapter` is plan/dry-run only. Production backend-core artifact CD is a separate executor because its build target is D601 CI while its runtime target is the master server. The default user-service delivery policy, including CI build, registry publication, dev validation, production CD and manual acceptance, is documented in `docs/reference/user-service-delivery.md`.

-For services with reviewed production artifact consumers, local-manifest `deploy apply --file ...` is not a production fallback. The CLI blocks `backend-core`, `frontend`, `baidu-netdisk` and `decision-center` before source materialization or Docker build and directs operators to `deploy apply --env prod --service <id> --commit <full-sha>`. This prevents a dirty worktree, local manifest or target-side source build from bypassing the pull-only artifact CD guardrails. The broader precheck and legacy-path classification live in `docs/reference/cicd-standardization.md`.
+For services with reviewed production artifact consumers, local-manifest `deploy apply --file ...` is not a production fallback. The CLI blocks `backend-core`, `frontend`, `baidu-netdisk`, `decision-center`, `mdtodo`, `claudeqq` and other reviewed pull-only consumers before source materialization or Docker build and directs operators to `deploy apply --env prod --service <id> --commit <full-sha>`. This prevents a dirty worktree, local manifest or target-side source build from bypassing the pull-only artifact CD guardrails. The broader precheck and legacy-path classification live in `docs/reference/cicd-standardization.md`.

 The current implementation has not yet enabled separate stable and integration dev lanes. Future lane names such as `dev-v1` and `dev-master`, or an equivalent nested schema, must be added as explicit `deploy.json` and CLI semantics before use. A deploy command must print the manifest ref it used and must not infer `release/v1` from a local branch, a dirty file, or an undocumented environment alias.

@@ -85,7 +85,7 @@ Phase 3 introduces the dev backend/frontend manifest at `src/components/microser

 `backend-core-dev` must use `unidesk-dev-runtime-config` and `unidesk-dev-runtime-secrets`, connect to `postgres-dev.../unidesk_dev`, expose HTTP on 8080 and provider ingress on 8081, and write logs under `/var/log/unidesk-dev`. `frontend-dev` must set `CORE_INTERNAL_URL=http://backend-core-dev.unidesk-dev.svc.cluster.local:8080` and must not proxy to production backend-core.

-The manifest keeps placeholder image tags and deploy commit values in source control. The controlled `deploy apply --env dev --service backend-core` path fetches `origin/master:deploy.json`, materializes the requested source commit on D601, narrows the dev core control manifest to the selected Service/Deployment pair, replaces placeholders with the requested commit and dev image tag, builds on D601, imports the image into native k3s containerd, applies only the `unidesk-dev` objects and stamps the Deployment. `deploy apply --env dev --service frontend`, `decision-center`, `project-manager`, `oa-event-flow`, `code-queue-mgr`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear` consume the existing D601 registry artifact instead of building source on the target. The direct Docker/Compose services currently use D601 registry pull + label/config dry-run + health contract as dev validation; they are not separate parallel dev instances yet. `code-queue-mgr` live prod apply remains supervisor-gated. Client dry-run and static validation remain useful checks before controlled apply:
+The manifest keeps placeholder image tags and deploy commit values in source control. The controlled `deploy apply --env dev --service backend-core` path fetches `origin/master:deploy.json`, materializes the requested source commit on D601, narrows the dev core control manifest to the selected Service/Deployment pair, replaces placeholders with the requested commit and dev image tag, builds on D601, imports the image into native k3s containerd, applies only the `unidesk-dev` objects and stamps the Deployment. `deploy apply --env dev --service frontend` uses the same selected dev manifest objects but consumes the existing D601 registry artifact `127.0.0.1:5000/unidesk/frontend:<commit>` instead of building frontend source on the target. Decision Center, MDTODO and ClaudeQQ use the same dev namespace but follow the D601 registry artifact consumer path instead of a source build: each verifies the commit-pinned image in D601 registry, imports it into native k3s containerd, applies its dev manifest, stamps the Deployment and verifies live commit/requestedCommit through the Kubernetes API service proxy. `project-manager`, `oa-event-flow`, `code-queue-mgr`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear` consume existing D601 registry artifacts for direct Docker/Compose validation rather than separate parallel k3s dev instances; `code-queue-mgr` live prod apply remains supervisor-gated. Client dry-run and static validation remain useful checks before controlled apply:

 - `bun scripts/cli.ts dev-env validate --manifest src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`
 - `KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply --dry-run=client --validate=false -f src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`
@@ -98,7 +98,7 @@ Phase 5 introduces the dev Code Queue execution manifest at `src/components/micr

 All dev Code Queue components must use `unidesk-dev-runtime-config` and `unidesk-dev-runtime-secrets`, connect to `postgres-dev.../unidesk_dev`, write logs and state under `/home/ubuntu/unidesk-dev-code-queue-deploy/state`, and expose HTTP on 4222 only as ClusterIP services. The scheduler uses `CODE_QUEUE_MAIN_PROVIDER_ID=D601-dev`, `CODE_QUEUE_WORKDIR=/workspace-dev`, `CODE_QUEUE_REMOTE_WORKDIR=/home/ubuntu/unidesk-dev-workspace`, disables ClaudeQQ notifications by default, and does not use the production `d601-tcp-egress-gateway` or production PostgreSQL route.

-Maintenance-channel direct D601 apply must not deploy dev Code Queue; the CLI rejects `deploy apply --env dev --service code-queue` and the old `codex deploy` compatibility entry is disabled. Dev Code Queue deployment must be implemented later as a controlled D601 target-side action that fetches `origin/master:deploy.json`, selects `environments.dev`, materializes the requested source commit on D601, uses the dev Code Queue control manifest from that D601 materialized commit, narrows it to Code Queue dev objects, replaces placeholders with the requested commit and `unidesk-code-queue:dev`, builds on D601, imports the image into native k3s containerd, applies only `unidesk-dev` objects and stamps the dev Deployments. Because Code Queue carries the agent toolchain and browser/runtime dependencies, dev builds may reuse an existing D601 `unidesk-code-queue:d601-build-base` or `unidesk-code-queue:d601` image when the dev build-base tag is absent, and the deploy executor allows a longer Code Queue build window than lightweight services. The scheduler has an explicit 5Gi memory limit and must use `Recreate` rollout strategy so an update does not temporarily require two scheduler replicas under the namespace quota. All dev Code Queue containers must set CPU limits so the namespace `LimitRange` does not inject a quota-breaking default CPU limit. Live health verification uses the Kubernetes API service proxy for the dev ClusterIP Service, not `kubectl exec` or debug binaries inside the application image. This first dev execution slice proves deployability, health and dev database isolation; wiring the dev frontend stable `code-queue` route through a dev `code-queue-mgr` is a separate later phase.
+Maintenance-channel direct D601 apply must not deploy dev Code Queue and the old `codex deploy` compatibility entry remains disabled. Dev Code Queue deployment is allowed only as the D601 registry artifact consumer for `deploy apply --env dev --service code-queue` or the equivalent `artifact-registry deploy-service --env dev --service code-queue`: it verifies the existing `127.0.0.1:5000/unidesk/code-queue:<commit>` artifact, imports it into native k3s containerd, applies only the `unidesk-dev` Code Queue manifest, stamps `code-queue-scheduler-dev`, `code-queue-read-dev`, `code-queue-write-dev` and `d601-dev-provider-egress-proxy`, and verifies the scheduler Service through the Kubernetes API service proxy. `deploy apply --env prod --service code-queue` and `artifact-registry deploy-service --env prod --service code-queue` must return explicit unsupported output and must not mutate production Code Queue manifests, Deployments or rollouts. The scheduler has an explicit 5Gi memory limit and must use `Recreate` rollout strategy so an update does not temporarily require two scheduler replicas under the namespace quota. All dev Code Queue containers must set CPU limits so the namespace `LimitRange` does not inject a quota-breaking default CPU limit. Live health verification uses the Kubernetes API service proxy for the dev ClusterIP Service, not `kubectl exec` or debug binaries inside the application image. This dev execution slice proves artifact deployability, health and dev database isolation; wiring the dev frontend stable `code-queue` route through a dev `code-queue-mgr` is a separate later phase.

 ## CLI

@@ -108,7 +108,7 @@ Maintenance-channel direct D601 apply must not deploy dev Code Queue; the CLI re

 `bun scripts/cli.ts deploy plan --env dev [--service <id>]` reads `origin/master:deploy.json#environments.dev` and prints a dry-run environment plan without checking or mutating live runtime resources. `deploy check --env dev` uses the same dry-run environment plan. `--env prod` is available for parity as a dry-run planning path; it reads `origin/master:deploy.json#environments.prod` and must not use a dirty local `deploy.json`.

-`bun scripts/cli.ts deploy apply [--file deploy.json | --env dev|prod] [--service <id>] [--commit <full-sha>] [--dry-run] [--force]` starts an asynchronous job only for supported targets. Use `bun scripts/cli.ts job status <jobId> --tail-bytes 30000` to observe progress. `--dry-run` resolves the same plan but does not build or replace runtime objects. `--force` rebuilds even when the live commit matches. Environment apply is not the dev e2e trigger; use `bun scripts/cli.ts ci run-dev-e2e` for the Git-controlled temporary namespace smoke flow. `--env dev` apply is enabled for persistent D601 `backend-core` target-side rollout and for `frontend`/`baidu-netdisk`/`decision-center`/`project-manager`/`oa-event-flow`/`code-queue-mgr`/`todo-note`/`findjob`/`pipeline`/`met-nonlinear` artifact consumers. `--env prod` apply exposes the D601 registry artifact consumer for `backend-core`, `frontend`, `baidu-netdisk`, `decision-center`, `project-manager`, `oa-event-flow`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear`; `code-queue-mgr` prod live apply is supervisor-gated and `k3sctl-adapter` is plan/dry-run only. Unsupported prod services return a structured `unsupported` payload instead of silently falling back to a maintenance-channel source build.
+`bun scripts/cli.ts deploy apply [--file deploy.json | --env dev|prod] [--service <id>] [--commit <full-sha>] [--dry-run] [--force]` starts an asynchronous job only for supported targets. Use `bun scripts/cli.ts job status <jobId> --tail-bytes 30000` to observe progress. `--dry-run` resolves the same plan but does not build or replace runtime objects. `--force` rebuilds even when the live commit matches. Environment apply is not the dev e2e trigger; use `bun scripts/cli.ts ci run-dev-e2e` for the Git-controlled temporary namespace smoke flow. `--env dev` apply is enabled for persistent D601 `backend-core` target-side rollout and for `frontend`/`baidu-netdisk`/`decision-center`/`mdtodo`/`claudeqq`/dev-only `code-queue`/`project-manager`/`oa-event-flow`/`code-queue-mgr`/`todo-note`/`findjob`/`pipeline`/`met-nonlinear` artifact consumers. `--env prod` apply exposes the D601 registry artifact consumer for `backend-core`, `frontend`, `baidu-netdisk`, `decision-center`, `mdtodo`, `claudeqq`, `project-manager`, `oa-event-flow`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear`; `code-queue-mgr` prod live apply is supervisor-gated and `k3sctl-adapter` is plan/dry-run only. Unsupported prod services, especially `code-queue`, return a structured `unsupported` payload instead of silently falling back to a maintenance-channel source build.

 All deploy commands output JSON. Long operations must use `.state/jobs/` and bounded log tails; no deploy path may succeed with missing progress output.

@@ -141,6 +141,8 @@ The exception is narrow:
 - `findjob` and `pipeline` are D601 direct Docker/Compose artifact consumers: CD runs on D601 through the existing provider-gateway/SSH maintenance bridge, verifies `127.0.0.1:5000/unidesk/<service>:<commit>` labels, writes deploy env/labels, and recreates only the target Compose service with `--no-build --no-deps --force-recreate`.
 - `met-nonlinear` has a D601 direct dry-run/plan contract, but live artifact deploy is blocked until the long-running `met-nonlinear-ts` image contract is separated from the ML image Dockerfile contract or otherwise proves the running container image label matches the requested commit.
 - `k3sctl-adapter` exposes only artifact consumer plan/dry-run here because it is an infrastructure control bridge; real prod deployment requires supervisor confirmation outside the standard user-service CD path.
+- `mdtodo` and `claudeqq` are k3s-managed artifact consumers: CI publishes `127.0.0.1:5000/unidesk/<service-id>:<commit>`, dev CD lands in `unidesk-dev`, prod CD lands in `unidesk`, and both paths verify Deployment metadata plus health through the Kubernetes API service proxy.
+- `code-queue` is a dev-only artifact consumer: CI may publish `127.0.0.1:5000/unidesk/code-queue:<commit>`, and dev CD may update only `unidesk-dev` Code Queue execution objects. Production artifact deploy, production rollout and production manifest mutation for `code-queue` are unsupported.
 - This exception must not be generalized to other services unless their resource profile and runtime boundary are documented with the same CI-producer/CD-consumer split.

 The registry contract is defined in `docs/reference/artifact-registry.md`; the CI producer rules are defined in `docs/reference/ci.md`.
@@ -181,7 +183,7 @@ The reconciler selects the executor from `config.json`:
 - `deployment.mode=internal-sidecar` on `main-server`: use the same main-server target-side source export, Docker build, image label stamping, fixed Compose project replacement and live commit verification as direct Compose services. This class is for private sidecars such as `code-queue-mgr`; it is still versioned by `deploy.json.commitId`, not by the operator's current worktree, and prod live apply remains supervisor-gated.
 - `deployment.mode=unidesk-direct` on a provider: this executor is disabled for D601 service deployment. The historical behavior dispatched `host.ssh` to the provider, built on the provider, then used the service's provider-local compose file and project; that shape must not remain a second deployment control plane.
 - Control bridges that UniDesk needs in order to inspect or repair an orchestrator must stay in this direct class. In particular, `k3sctl-adapter` is a UniDesk-managed bridge to native k3s and must remain outside k3s; Docker packaging on Docker Desktop/WSL must create an explicit host-local bridge, currently an adapter-container SSH local tunnel, to reach `/etc/rancher/k3s/k3s.yaml` and WSL `127.0.0.1:6443`.
- `deployment.mode=k3sctl-managed`: the target behavior is to build on the active control target unless the service has a reviewed artifact-consumer exception, verify native k3s on the host OS/WSL distro, import the image into native k3s/containerd, apply the existing Kubernetes manifest, stamp the Deployment and wait for rollout. On D601, persistent dev apply is currently allowed for `backend-core` target-side build plus `frontend` and `decision-center` artifact consumption in `unidesk-dev`; normal production services still cannot use a maintenance-channel direct rollout. The executor must use the native kubeconfig and containerd socket, for example `/etc/rancher/k3s/k3s.yaml` and `/run/k3s/containerd/containerd.sock`; running k3s itself in Docker is forbidden for both control-plane and worker nodes. A `rancher/k3s` image or legacy container may only be used as a temporary artifact source during migration, and any active containerized k3s control plane must be stopped before verification succeeds. The executor must preload a valid `rancher/mirrored-pause:3.6` sandbox image into native k3s containerd through the provider-gateway one-shot egress path, verify its entrypoint is `/pause`, and reject fake or sleep-based replacement images. Code Queue's k3s migration executor must also stop/remove the legacy direct Docker `code-queue-backend` after k3s rollout, so there is never a second scheduler running beside the native k3s scheduler.
+- `deployment.mode=k3sctl-managed`: the target behavior is to build on the active control target unless the service has a reviewed artifact-consumer exception, verify native k3s on the host OS/WSL distro, import the image into native k3s/containerd, apply the existing Kubernetes manifest, stamp the Deployment and wait for rollout. On D601, persistent dev apply is currently allowed for `backend-core` target-side build plus `frontend`, `decision-center`, `mdtodo`, `claudeqq` and dev-only `code-queue` artifact consumption in `unidesk-dev`; production artifact consumers are limited to reviewed services and exclude Code Queue. Normal production services still cannot use a maintenance-channel direct rollout. The executor must use the native kubeconfig and containerd socket, for example `/etc/rancher/k3s/k3s.yaml` and `/run/k3s/containerd/containerd.sock`; running k3s itself in Docker is forbidden for both control-plane and worker nodes. A `rancher/k3s` image or legacy container may only be used as a temporary artifact source during migration, and any active containerized k3s control plane must be stopped before verification succeeds. The executor must preload a valid `rancher/mirrored-pause:3.6` sandbox image into native k3s containerd through the provider-gateway one-shot egress path, verify its entrypoint is `/pause`, and reject fake or sleep-based replacement images. k3s-managed deploys must use ClusterIP Services and Kubernetes API service proxy health checks; they must not add NodePort, hostPort, public business ports or provider-gateway direct business backends.

 D601 Docker local images are not the source of truth for k3s runtime availability. For Code Queue, the deploy gate must verify `unidesk-code-queue:d601` exists in native k3s containerd after import with `ctr --address /run/k3s/containerd/containerd.sock -n k8s.io images ls`, and it must fail before rollout if the tag is missing. The same gate must verify every production Code Queue Deployment that uses the image (`code-queue`, `code-queue-read`, `code-queue-write`, `d601-provider-egress-proxy`, `d601-tcp-egress-gateway`) still references exactly `unidesk-code-queue:d601`; otherwise kubelet may attempt an external registry pull and leave base gateways in `ImagePullBackOff`.

@@ -193,6 +195,10 @@ Baidu Netdisk is the main-server `unidesk-direct` sample for artifact CD. Contro

 Decision Center is a standard `k3sctl-managed` service in this model, but D601 maintenance-channel direct apply must not deploy it. Controlled CD for Decision Center uses the D601 registry artifact consumer in both dev and prod: it verifies `unidesk/decision-center:<commit>` exists in the registry, imports `unidesk-decision-center:<commit>` into native k3s containerd, applies the appropriate Decision Center manifest, stamps the Deployment, and verifies health through `/api/microservices/decision-center/health` while proving the live and requested commit match. It must not add a main-server Compose service, NodePort, hostPort, or provider-gateway direct HTTP backend for Decision Center.

+MDTODO and ClaudeQQ are standard `k3sctl-managed` artifact consumers in the same model. Dev rollout lands in `unidesk-dev` using their dev manifests; production rollout lands in `unidesk` using the production manifests. Both services must pass dev validation before production rollout, must expose deploy metadata in health when practical, and must verify through the Kubernetes API service proxy instead of NodePort, hostPort or provider-gateway direct HTTP.
+
+Code Queue is explicitly narrower. Only `--env dev --service code-queue` is a supported artifact consumer target, and it may mutate only `unidesk-dev` Code Queue execution objects. Production Code Queue artifact deploy, production rollout and production manifest mutation are unsupported and must fail visibly.
+
 ## CI Separation

 Continuous integration is intentionally separate from this deploy reconciler. D601 k3s hosts Tekton CI resources described in `docs/reference/ci.md`; PipelineRuns may clone, check, run read-only performance gates, create temporary CI-owned namespaces for dev manifest smoke e2e, or publish commit-pinned backend-core/user-service image artifacts to the D601 artifact registry. They must not call `deploy apply`, `codex deploy`, `kubectl rollout restart` for production services, mutate `deploy.json`, or write production namespaces.