258 lines
44 KiB
Markdown
258 lines
44 KiB
Markdown
# Desired Deploy Reconciler
|
||
|
||
UniDesk deployment is driven by a desired-state manifest. The manifest answers only one question: which service should run which repository commit. Runtime topology, ports, providers, compose files, Kubernetes manifests, health paths and proxy policy remain in `config.json` and the existing service manifests.
|
||
|
||
Persistent D601 dev environment rules, including the public dev frontend port, `deploy apply --env dev` service scope and Rust backend-core build boundary, are owned by `docs/reference/dev-environment.md`. Release-line governance, CI/CD runtime pinning and the `release/v1` transition policy are owned by `docs/reference/release-governance.md` and [GitHub issue #6](https://github.com/pikasTech/unidesk/issues/6). This document owns the generic desired-state reconciler and target-side build contract.
|
||
|
||
## Manifest
|
||
|
||
The root `deploy.json` is the single desired-state source for both prod and dev. Environment branches such as `deploy/dev` and `deploy/prod` are deprecated because they create a second control plane for version intent.
|
||
|
||
```json
|
||
{
|
||
"schemaVersion": 2,
|
||
"environments": {
|
||
"prod": {
|
||
"services": [
|
||
{
|
||
"id": "code-queue",
|
||
"repo": "https://github.com/pikasTech/unidesk",
|
||
"commitId": "0c3cdb4ee06a23361ed511a2da033d67b53d16f4"
|
||
}
|
||
]
|
||
},
|
||
"dev": {
|
||
"services": [
|
||
{
|
||
"id": "backend-core",
|
||
"repo": "https://github.com/pikasTech/unidesk",
|
||
"commitId": "348c644"
|
||
}
|
||
]
|
||
}
|
||
}
|
||
}
|
||
```
|
||
|
||
`schemaVersion=1` remains accepted only as a local compatibility format. Standard environment commands use `schemaVersion=2` and select `environments.dev.services` or `environments.prod.services`.
|
||
|
||
`deploy.json` service entries must not contain provider IDs, ports, compose service names, Kubernetes namespace, health paths, environment variables, Dockerfile paths or build commands. The deploy reconciler joins each service `id` with `config.json.microservices[]` and existing k3s manifests to resolve those details. A service listed in `deploy.json` but missing from `config.json` is an error. A service with no Dockerfile source artifact is reported as unsupported rather than silently skipped. `commitId` may be a unique pushed short SHA or a full SHA; every deploy command resolves it through the remote repository to a full 40-character commit before target-side build or rollout, and fails immediately if the SHA is missing or ambiguous.
|
||
|
||
The optional non-service execution declaration under `environments.dev` is intentionally not specified here. The only currently allowed declaration is `ci`, and its authoritative `repo`, `scriptPath`, `timeoutMs`, short launcher, host fetch boundary and no-CD rules are defined only in `docs/reference/dev-ci-runner.md`.
|
||
|
||
Environment mode never reads the local dirty working tree manifest. `deploy check --env ...`, `deploy plan --env ...` and `deploy apply --env ...` fetch `origin/master`, read `origin/master:deploy.json`, select `environments.<env>`, and report the manifest commit/blob, service commit IDs, target namespace, database fingerprint and Provider identity. `deploy apply --env dev` is currently enabled for reviewed artifact consumers `backend-core`, `frontend`, `baidu-netdisk`, `decision-center`, `mdtodo`, `claudeqq`, dev-only `code-queue`, `project-manager`, `oa-event-flow`, `code-queue-mgr`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear`. `deploy apply --env prod` exposes reviewed registry artifact consumers (`backend-core`, `frontend`, `baidu-netdisk`, `decision-center`, `mdtodo`, `claudeqq`, `project-manager`, `oa-event-flow`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear`), while `code-queue` must report unsupported, `code-queue-mgr` remains supervisor-gated and `k3sctl-adapter` is plan/dry-run only. Backend-core artifact CD is a pull-only consumer in both dev and prod; the build target is D601 CI, while dev runtime target is D601 native k3s and prod runtime target is the master server Compose stack. The default user-service delivery policy, including CI build, registry publication, dev validation, production CD and manual acceptance, is documented in `docs/reference/user-service-delivery.md`.
|
||
|
||
`--commit <full-sha>` is allowed only with `--env dev|prod --service <id>` for reviewed artifact consumers. It overrides the selected service commit for that one artifact consumer while still using the Git-backed environment manifest for target, namespace, repo, deploy ref and guardrails. It is the supported temporary shape for release-line frontend/backend-core validation and rollback when the artifact was produced from a pushed `release/v1` commit but `origin/master:deploy.json#environments.<env>.services.<id>.commitId` has not been repinned. It must not be used for local-manifest mode, multi-service apply, or target-side source-build services.
|
||
|
||
For services with reviewed production artifact consumers, local-manifest `deploy apply --file ...` is not a production fallback. The CLI blocks `backend-core`, `frontend`, `baidu-netdisk`, `decision-center`, `mdtodo`, `claudeqq` and other reviewed pull-only consumers before source materialization or Docker build and directs operators to `deploy apply --env prod --service <id> --commit <full-sha>`. This prevents a dirty worktree, local manifest or target-side source build from bypassing the pull-only artifact CD guardrails. The broader precheck and legacy-path classification live in `docs/reference/cicd-standardization.md`.
|
||
|
||
The current implementation has not yet enabled separate stable and integration dev lanes. Future lane names such as `dev-v1` and `dev-master`, or an equivalent nested schema, must be added as explicit `deploy.json` and CLI semantics before use. A deploy command must print the manifest ref it used and must not infer `release/v1` from a local branch, a dirty file, or an undocumented environment alias. For frontend-only release-line validation, the current bridge is an explicit commit override on the reviewed artifact consumer: first publish the artifact with `ci publish-user-service --service frontend --commit <release-v1-full-sha>`, then run `deploy apply --env dev --service frontend --commit <release-v1-full-sha>`; production uses the same shape with `--env prod` only after dev evidence is accepted.
|
||
|
||
CI/CD server and control-plane services are normal deployable services for versioning purposes: production runtime must be pinned by `deploy.json` to a known commit. A CLI built from `master` may orchestrate the pinned server only through backward-compatible APIs and server-reported capabilities; it must not bypass server-side deploy policy when the pinned server does not support a requested operation.
|
||
|
||
The only D601 direct-service exception in local manifest mode is `k3sctl-adapter`, because it is the UniDesk-managed control bridge outside the k3s fault domain and owns the Kubernetes service catalog used by the dev public frontend path. Its artifact consumer path is plan/dry-run only and never performs real prod deployment without supervisor confirmation. D601 Code Queue, Decision Center, MDTODO, ClaudeQQ and future k3s-managed workloads remain blocked from maintenance-channel direct deploy.
|
||
|
||
`config.json.microservices[].repository.commitId` is retained for catalog compatibility, but `deploy.json` is the deployment version authority for the reconciler.
|
||
|
||
## Dev CI Runner
|
||
|
||
Dev desired-state smoke verification is not a deploy executor. Use `bun scripts/cli.ts ci run-dev-e2e` for the Git-controlled temporary namespace runner described in `docs/reference/dev-ci-runner.md`; that command must not roll out persistent D601 services.
|
||
|
||
Persistent dev `backend-core` and `frontend` rollout is separate from that smoke runner and is described in `docs/reference/dev-environment.md`.
|
||
|
||
## D601 Dev Foundation
|
||
|
||
Phase 2 of the D601 dev environment creates only the isolated namespace and database foundation. The authoritative manifest is `src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-foundation.k8s.yaml`.
|
||
|
||
It may create resources only in `unidesk-dev`:
|
||
|
||
- `Namespace unidesk-dev`, plus quota and default limits.
|
||
- `Secret unidesk-dev-runtime-secrets` as a dev-only template for DB credentials, provider token, auth/session secret, and Code Queue model secret placeholders. The frontend auth/session values are placeholders in this manifest; the controlled dev frontend deploy path syncs them from main-server `config.json.auth` so dev and production use the same login identity and session signer.
|
||
- `ConfigMap unidesk-dev-runtime-config` for dev identity, desired-state source `origin/master:deploy.json#environments.dev`, provider id `D601-dev`, Code Queue dev paths, and non-secret runtime defaults. `SESSION_TTL_SECONDS` follows the same main-server auth config when `frontend` is deployed.
|
||
- `ConfigMap unidesk-dev-db-guard` with an executable guard script that rejects production-looking `DATABASE_URL` values.
|
||
- `StatefulSet/Service postgres-dev` with a 5Gi persistent volume claim and bounded CPU/memory requests/limits.
|
||
- `Job unidesk-dev-db-migrate`, which waits for `postgres-dev`, runs the guard, then prepares backend-core and Code Queue tables in the independent `unidesk_dev` database.
|
||
|
||
The manifest must not create, update, or delete production namespace resources, production DB objects, production PVCs, production Deployments/Services/Secrets, or main server Docker Compose services. Static validation is available through `bun scripts/cli.ts dev-env validate`; Kubernetes client dry-run is `bun scripts/cli.ts dev-env validate --kubectl-dry-run`. If applying manually during Phase 2, the only allowed apply target is this manifest and the post-check must prove production resources are unchanged, for example by comparing `kubectl -n unidesk get deploy,sts,svc,secret,pvc -o name` before and after.
|
||
|
||
Before applying the foundation on a fresh D601 native k3s runtime, run `bun scripts/cli.ts dev-env prewarm-images` and wait for the returned job to succeed. This imports the foundation images `postgres:16-alpine` and `rancher/mirrored-library-busybox:1.36.1` from Docker into `/run/k3s/containerd/containerd.sock`; k3s/containerd must not depend on live Docker Hub pulls during rollout. If this step is skipped, `postgres-dev` or the local-path helper pod can remain `ImagePullBackOff`, leaving the PVC pending even though the manifest is valid.
|
||
|
||
Phase 2 guardrails are deliberately limited to the dev manifest and CLI validator. Runtime startup guards for dev backend-core, Code Queue and Code Queue Manager must be reviewed and shipped as a separate change before dev workloads are exposed beyond dry-run or controlled apply.
|
||
|
||
On D601, dev/prod k3s verification must use the native k3s kubeconfig explicitly: `KUBECONFIG=/etc/rancher/k3s/k3s.yaml`. The default `kubectl` context may point at Docker Desktop and is not an acceptable target for UniDesk k3s deploy validation.
|
||
|
||
## D601 Dev Core
|
||
|
||
Phase 3 introduces the dev backend/frontend manifest at `src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`. It may create only `backend-core-dev` and `frontend-dev` Deployment/Service objects in `unidesk-dev`; the persistent rollout contract and Rust build boundary are owned by `docs/reference/dev-environment.md`.
|
||
|
||
`backend-core-dev` must use `unidesk-dev-runtime-config` and `unidesk-dev-runtime-secrets`, connect to `postgres-dev.../unidesk_dev`, expose HTTP on 8080 and provider ingress on 8081, and write logs under `/var/log/unidesk-dev`. `frontend-dev` must set `CORE_INTERNAL_URL=http://backend-core-dev.unidesk-dev.svc.cluster.local:8080` and must not proxy to production backend-core.
|
||
|
||
The manifest keeps placeholder image tags and deploy commit values in source control. The controlled `deploy apply --env dev --service backend-core` path consumes the existing D601 registry artifact `127.0.0.1:5000/unidesk/backend-core:<commit>` produced by `ci publish-backend-core`; it does not compile Rust or build a Docker image during CD. Backend-core and frontend use the same selected dev core manifest objects: CD verifies the commit-pinned registry image and labels, imports the artifact into native k3s containerd, applies only the selected `unidesk-dev` objects, stamps the Deployment, and verifies live commit/requestedCommit through the Kubernetes API service proxy. Decision Center, MDTODO and ClaudeQQ use the same dev namespace and D601 registry artifact consumer path. `project-manager`, `oa-event-flow`, `code-queue-mgr`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear` consume existing D601 registry artifacts for direct Docker/Compose validation rather than separate parallel k3s dev instances; `code-queue-mgr` live prod apply remains supervisor-gated. Client dry-run and static validation remain useful checks before controlled apply:
|
||
|
||
- `bun scripts/cli.ts dev-env validate --manifest src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`
|
||
- `KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply --dry-run=client --validate=false -f src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`
|
||
|
||
backend-core and frontend keep their production health payload shape by default. They add `environment`, `namespace`, `databaseName`, `serviceId`, `deployRef` and deploy commit metadata only when `UNIDESK_ENV=dev` or `UNIDESK_NAMESPACE=unidesk-dev` is set. The frontend shell shows a visible DEV ribbon only under the same dev identity.
|
||
|
||
## D601 Dev Code Queue
|
||
|
||
Phase 5 introduces the dev Code Queue execution manifest at `src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-code-queue.k8s.yaml`. It may create only Code Queue dev execution objects in `unidesk-dev`: `code-queue-scheduler-dev`, `code-queue-read-dev`, `code-queue-write-dev` and the supporting `d601-dev-provider-egress-proxy`.
|
||
|
||
All dev Code Queue components must use `unidesk-dev-runtime-config` and `unidesk-dev-runtime-secrets`, connect to `postgres-dev.../unidesk_dev`, write logs and state under `/home/ubuntu/unidesk-dev-code-queue-deploy/state`, and expose HTTP on 4222 only as ClusterIP services. The scheduler uses `CODE_QUEUE_MAIN_PROVIDER_ID=D601-dev`, `CODE_QUEUE_WORKDIR=/workspace-dev`, `CODE_QUEUE_REMOTE_WORKDIR=/home/ubuntu/unidesk-dev-workspace`, disables ClaudeQQ notifications by default, and does not use the production `d601-tcp-egress-gateway` or production PostgreSQL route.
|
||
|
||
Maintenance-channel direct D601 apply must not deploy dev Code Queue and the old `codex deploy` compatibility entry remains disabled. Dev Code Queue deployment is allowed only as the D601 registry artifact consumer for `deploy apply --env dev --service code-queue` or the equivalent `artifact-registry deploy-service --env dev --service code-queue`: it verifies the existing `127.0.0.1:5000/unidesk/code-queue:<commit>` artifact, imports it into native k3s containerd, applies only the `unidesk-dev` Code Queue manifest, stamps `code-queue-scheduler-dev`, `code-queue-read-dev`, `code-queue-write-dev` and `d601-dev-provider-egress-proxy`, and verifies the scheduler Service through the Kubernetes API service proxy. `deploy apply --env prod --service code-queue` and `artifact-registry deploy-service --env prod --service code-queue` must return explicit unsupported output and must not mutate production Code Queue manifests, Deployments or rollouts. The scheduler has an explicit 5Gi memory limit and must use `Recreate` rollout strategy so an update does not temporarily require two scheduler replicas under the namespace quota. All dev Code Queue containers must set CPU limits so the namespace `LimitRange` does not inject a quota-breaking default CPU limit. Live health verification uses the Kubernetes API service proxy for the dev ClusterIP Service, not `kubectl exec` or debug binaries inside the application image. This dev execution slice proves artifact deployability, health and dev database isolation; wiring the dev frontend stable `code-queue` route through a dev `code-queue-mgr` is a separate later phase.
|
||
|
||
Production `code-queue-mgr` is a separate main-server Compose sidecar artifact consumer. `deploy apply --env prod --service code-queue-mgr --dry-run` may plan only the `code-queue-mgr` Compose service/container and must surface that D601 Code Queue scheduler/runner, queued tasks, interrupts and cancellations are excluded targets. Non-dry-run production apply for this sidecar remains supervisor-gated even when the artifact exists.
|
||
|
||
## CLI
|
||
|
||
`bun scripts/cli.ts deploy check [--file deploy.json] [--service <id>]` checks the live runtime against the desired repo and commit without changing the system.
|
||
|
||
`bun scripts/cli.ts deploy plan [--file deploy.json] [--service <id>]` prints the same live state plus the intended action: `noop`, `deploy` or `unsupported`.
|
||
|
||
`bun scripts/cli.ts deploy plan --env dev [--service <id>]` reads `origin/master:deploy.json#environments.dev` and prints a dry-run environment plan without checking or mutating live runtime resources. `deploy check --env dev` uses the same dry-run environment plan. `--env prod` is available for parity as a dry-run planning path; it reads `origin/master:deploy.json#environments.prod` and must not use a dirty local `deploy.json`.
|
||
|
||
Environment plan output must be sufficient to review the artifact matrix without running a live apply. Each service item includes `deploymentPath`, `artifactConsumer.consumerKind`, `artifactConsumer.registryImage`, `artifactConsumer.registry`, `artifactConsumer.source`, `artifactConsumer.build`, `artifactConsumer.noRuntimeSourceBuild`, `artifactConsumer.dryRunOnly`, `target`, `validation` and `liveApply` where relevant. `consumerKind=d601-direct-compose` means the reviewed consumer touches only the D601 Docker/Compose service and private health path; `consumerKind=d601-k3s-managed` means the reviewed consumer imports the artifact into native k3s/containerd and verifies through the Kubernetes API service proxy; `consumerKind=main-server-compose` means the reviewed consumer streams or loads the D601 artifact into the main-server Compose service; `consumerKind=d601-dev-target-side-build` is retained only as a legacy classification and should not appear for backend-core. Artifact consumer plan items must explicitly report `noRuntimeSourceBuild=true`, expose registry/source/build boundaries including digest provenance, and list forbidden build/public exposure actions. Services with runtime secret gates, currently `baidu-netdisk`, must also expose a redacted `artifactConsumer.runtimeSecrets` contract with `secretSource`, `requiredSecretsPresent`, `missingSecretKeys` and `recommendedAction`; this contract may report key names, booleans and lengths only, never secret values. Blocked or gated services must keep structured `dryRunOnly` / `blockedReason` output, for example `met-nonlinear` `runtime-verification-blocked` and `k3sctl-adapter` supervisor-only production apply.
|
||
|
||
For `--env dev --service code-queue`, the environment plan must also expose a `boundary` block that separates the CI producer from the dev CD consumer. CI is allowed to publish only `127.0.0.1:5000/unidesk/code-queue:<commit>` plus digest/label evidence. DEV CD may consume that artifact only for `unidesk-dev` Code Queue scheduler/read/write/provider-egress-proxy objects after an operator reviews the dry-run. For `--env prod --service code-queue`, the service item must remain `deploymentPath=unsupported`, `artifactConsumer.consumerKind=unsupported`, `target.deployCommandShape=none` and `liveApply.allowed=false`; it must not expose production k3s as an executable target. The prod boundary must state that production Code Queue CD needs a future supervisor-approved design and that this runner cannot self-deploy, mutate the production namespace, restart scheduler/runner, or interrupt/cancel tasks.
|
||
|
||
`bun scripts/cli.ts deploy apply [--file deploy.json | --env dev|prod] [--service <id>] [--commit <full-sha>] [--dry-run] [--force]` starts an asynchronous job only for supported targets. Use `bun scripts/cli.ts job status <jobId> --tail-bytes 30000` to observe progress. `--dry-run` resolves the same plan but does not build or replace runtime objects. `--force` redeploys even when the live commit matches. Environment apply is not the dev e2e trigger; use `bun scripts/cli.ts ci run-dev-e2e` for the Git-controlled temporary namespace smoke flow. `--env dev` apply is enabled for `backend-core`/`frontend`/`baidu-netdisk`/`decision-center`/`mdtodo`/`claudeqq`/dev-only `code-queue`/`project-manager`/`oa-event-flow`/`code-queue-mgr`/`todo-note`/`findjob`/`pipeline`/`met-nonlinear` artifact consumers. `--env prod` apply exposes the D601 registry artifact consumer for `backend-core`, `frontend`, `baidu-netdisk`, `decision-center`, `mdtodo`, `claudeqq`, `project-manager`, `oa-event-flow`, `todo-note`, `findjob`, `pipeline` and `met-nonlinear`; `code-queue-mgr` prod live apply is supervisor-gated and `k3sctl-adapter` is plan/dry-run only. `--commit` may override one selected reviewed artifact consumer in either dev or prod, for example `deploy apply --env dev --service backend-core --commit <full-sha>` or `deploy apply --env dev --service frontend --commit <release-v1-full-sha>`, and the image must already exist as `127.0.0.1:5000/unidesk/<service-id>:<commit>`. Unsupported prod services, especially `code-queue`, return a structured `unsupported` payload instead of silently falling back to a maintenance-channel source build.
|
||
|
||
All deploy commands output JSON. Long operations must use `.state/jobs/` and bounded log tails; no deploy path may succeed with missing progress output.
|
||
|
||
## Target-Side Build
|
||
|
||
Target-side build is the standard deployment mode. The controller may run on the main server, but source materialization, compile/build, Docker image creation and deployment normally happen on the target node that will run the service.
|
||
|
||
- Main server services are fetched, built and deployed on the main server.
|
||
- D601 services are fetched, built and deployed on D601.
|
||
- D518 services are fetched, built and deployed on D518.
|
||
- k3s managed services are built on the active control target and then imported into that target's Kubernetes container runtime.
|
||
|
||
The reconciler distributes only repository URL, commit ID, Dockerfile path, build context and the existing deployment manifest/compose declaration. It must not distribute large Docker images between hosts as the default path, and it must not accept `docker commit` images, dirty worktrees or hand-mutated runtime containers as deployment truth.
|
||
|
||
Target-side source-build services fetch the remote repository, resolve the requested commit to a full 40 character SHA and export tracked files with `git archive`. Build contexts are created from that archive, not from the operator's current working tree. Artifact consumers such as dev/prod backend-core do not materialize source during CD; they use the Git-backed environment manifest only for commit intent, target selection and deploy ref metadata. The master server side may only do lightweight CLI orchestration, environment ref reading and remote command dispatch.
|
||
|
||
## Artifact Consumer Exception
|
||
|
||
Backend-core and reviewed user-service samples are explicit exceptions to standard target-side build. The runtime target can be the master server Compose stack or D601 native k3s dev namespace, but the build target is D601 CI; CD then consumes only commit-pinned images from the D601 artifact registry.
|
||
|
||
The exception is narrow:
|
||
|
||
- CI on D601 builds `src/components/backend-core/Dockerfile` from a pushed commit, stamps image labels and publishes `127.0.0.1:5000/unidesk/backend-core:<commit>` to the D601 artifact registry.
|
||
- Dev CD on D601 imports that existing image into native k3s containerd, sets `backend-core-dev` to the commit image, stamps deploy metadata, and verifies `/health.deploy.commit` and `deploy.requestedCommit`.
|
||
- Prod CD on the master server pulls that existing image through the controlled artifact-registry relay, retags it for the Compose service, recreates only `backend-core` with `--no-build --no-deps --force-recreate`, and verifies the running commit.
|
||
- CD must not run Rust compilation, Docker build, Compose build or `server rebuild backend-core`.
|
||
- The legacy `artifact-registry deploy-backend-core` compatibility entry is deprecated and disabled as a standard entrypoint; use `deploy apply --env prod --service backend-core --commit <full-sha>` so the common artifact-consumer guardrails execute first.
|
||
- The pushed Git commit remains the version source of truth. The image registry is a content cache and transfer boundary, not a replacement for `deploy.json` or Git.
|
||
- `baidu-netdisk` is the first main-server direct user-service sample for the same split: CI publishes `127.0.0.1:5000/unidesk/baidu-netdisk:<commit>` from `src/components/microservices/baidu-netdisk/Dockerfile`; dev validation and prod CD both pull that artifact, retag `baidu-netdisk`, recreate only `baidu-netdisk` with `--no-build --no-deps --force-recreate`, and verify image labels plus `/health.deploy.commit`. The current prod lane is aligned when the artifact, running image and health commit match; dev apply remains gated until the canonical Compose env secret source reports the three Baidu keys present through the redacted `runtimeSecrets` contract.
|
||
- `frontend` is the UniDesk UI artifact sample: CI publishes `127.0.0.1:5000/unidesk/frontend:<commit>` from `src/components/frontend/Dockerfile`; dev CD imports that artifact into native k3s `frontend-dev`, prod CD retags it as `unidesk-frontend` for the master-server Compose service, and both paths verify image labels plus `/health.deploy.commit`.
|
||
- `findjob` and `pipeline` are D601 direct Docker/Compose artifact consumers: CD runs on D601 through the existing provider-gateway/SSH maintenance bridge, verifies `127.0.0.1:5000/unidesk/<service>:<commit>` labels, writes deploy env/labels, and recreates only the target Compose service with `--no-build --no-deps --force-recreate`.
|
||
- `met-nonlinear` has a D601 direct dry-run/plan contract, but live artifact deploy is blocked until the long-running `met-nonlinear-ts` image contract is separated from the ML image Dockerfile contract or otherwise proves the running container image label matches the requested commit.
|
||
- `k3sctl-adapter` exposes only artifact consumer plan/dry-run here because it is an infrastructure control bridge; real prod deployment requires supervisor confirmation outside the standard user-service CD path.
|
||
- `mdtodo` and `claudeqq` are k3s-managed artifact consumers: CI publishes `127.0.0.1:5000/unidesk/<service-id>:<commit>`, dev CD lands in `unidesk-dev`, prod CD lands in `unidesk`, and both paths verify Deployment metadata plus health through the Kubernetes API service proxy.
|
||
- `code-queue` is a dev-only artifact consumer: CI may publish `127.0.0.1:5000/unidesk/code-queue:<commit>`, and dev CD may update only `unidesk-dev` Code Queue execution objects. Production artifact deploy, production rollout and production manifest mutation for `code-queue` are unsupported.
|
||
- This exception must not be generalized to other services unless their resource profile and runtime boundary are documented with the same CI-producer/CD-consumer split.
|
||
|
||
The registry contract is defined in `docs/reference/artifact-registry.md`; the CI producer rules are defined in `docs/reference/ci.md`.
|
||
|
||
## Upstream Image Exception
|
||
|
||
`filebrowser` and `filebrowser-d601` are not source-built UniDesk services and must not be modeled as Dockerfile producers. Their minimal catalog expression is `CI.json.artifacts[]` entries with `kind=upstream-image` plus `config.json.microservices[].repository.artifactSource`:
|
||
|
||
- upstream image: `docker.io/filebrowser/filebrowser:v2.63.3`;
|
||
- upstream source revision: `ca5e249e3c0c94159c2136a0cd431a424eb18472`;
|
||
- digest pin: required before rollout;
|
||
- mirror strategy: mirror only after digest verification to `127.0.0.1:5000/upstream/filebrowser/filebrowser`;
|
||
- CD: pull-only by digest or mirror digest, then verify image identity, OCI labels and private proxy health.
|
||
|
||
If the upstream registry is unavailable during precheck, the service remains `pending-network-verification`; a locally cached image id is supporting evidence only and not a manifest digest pin.
|
||
|
||
## One-Shot Build Proxy
|
||
|
||
Target-side source fetches and Docker builds that need external network access use a one-shot proxy scope through provider-gateway WS egress. Provider targets connect only to their node-local provider-gateway egress endpoint, normally `http://127.0.0.1:18789`; provider-gateway carries the TCP stream over the already-authenticated provider WebSocket to the main server, and the main server opens the final outbound TCP connection. This is the only allowed proxy channel for provider-side deploy source fetches and builds. The deploy path must not mutate host-global proxy settings:
|
||
|
||
- Do not edit `/etc/docker/daemon.json`.
|
||
- Do not edit shell profiles or global Docker CLI config.
|
||
- Do not leave long-lived host `HTTP_PROXY`, `HTTPS_PROXY` or `ALL_PROXY`.
|
||
- Do not silently fall back to target local direct internet.
|
||
- Do not create a separate SSH SOCKS proxy, public master proxy port, or direct backend-core/provider-ingress connection for deploy egress.
|
||
|
||
The standard implementation first probes GitHub through the node-local egress proxy, then runs target-side `git clone`/`git fetch` and the Docker build in that scoped environment. It also uses the target Docker daemon's local BuildKit builder so target-side base image and layer caches are reused. Proxy variables are scoped to the current deploy step and passed as matching `--build-arg` values for Dockerfile `RUN` steps; they are not written to daemon or shell configuration. Provider targets also use `docker buildx build --network host` so `127.0.0.1:<proxy-port>` inside `RUN` resolves to the target host's loopback provider-gateway egress proxy. Each deploy must log the proxy channel and probe result, for example `target_source_proxy=provider-gateway-ws-egress:http://127.0.0.1:18789`, `target_build_proxy=provider-gateway-ws-egress:http://127.0.0.1:18789` and `target_build_proxy_probe=ok`.
|
||
|
||
Build cache is part of the deployment contract, not an optimization left to Docker defaults. The deploy reconciler must pass inline BuildKit cache metadata (`--cache-to type=inline`) and import the current target image as cache source when it exists (`--cache-from <image>`). Dockerfiles that intentionally expose a warm build-base argument, such as Code Queue's `CODE_QUEUE_BASE_IMAGE`, may use the target-local `<image>-build-base` image to avoid re-running large apt/npm/Playwright setup layers; this is still target-local build cache and must be logged as `target_build_base_image=<image>-build-base`. If a service later needs an isolated `docker-container` builder or a local cache directory backend, it may use one only as a service-specific fallback and must still log proxy resolution, proxy probe result, cache source, cache destination and builder cleanup. The default path must not discard target-local image cache by creating a fresh builder for every deploy.
|
||
|
||
Main server targets may build without a proxy unless a service explicitly requires one. Provider targets must not bypass provider-gateway WS egress for GitHub, Debian apt, npm, Playwright, model downloads or any other external build dependency.
|
||
|
||
## Deployment Executors
|
||
|
||
The reconciler selects the executor from `config.json`:
|
||
|
||
- `deployment.mode=unidesk-direct` on `main-server`: the legacy/local manifest executor builds the image on the main server, then uses the fixed UniDesk Compose project and `up -d --no-build --no-deps --force-recreate <service>`. Reviewed artifact-consumer services such as `frontend`, `baidu-netdisk`, `project-manager` and `oa-event-flow` use the D601 registry pull-only path for `--env dev` and `--env prod` instead.
|
||
- `deployment.mode=internal-sidecar` on `main-server`: use the same main-server target-side source export, Docker build, image label stamping, fixed Compose project replacement and live commit verification as direct Compose services. This class is for private sidecars such as `code-queue-mgr`; it is still versioned by `deploy.json.commitId`, not by the operator's current worktree, and prod live apply remains supervisor-gated.
|
||
- `deployment.mode=unidesk-direct` on a provider: this executor is disabled for D601 service deployment. The historical behavior dispatched `host.ssh` to the provider, built on the provider, then used the service's provider-local compose file and project; that shape must not remain a second deployment control plane.
|
||
- Control bridges that UniDesk needs in order to inspect or repair an orchestrator must stay in this direct class. In particular, `k3sctl-adapter` is a UniDesk-managed bridge to native k3s and must remain outside k3s; Docker packaging on Docker Desktop/WSL must create an explicit host-local bridge, currently an adapter-container SSH local tunnel, to reach `/etc/rancher/k3s/k3s.yaml` and WSL `127.0.0.1:6443`.
|
||
- `deployment.mode=k3sctl-managed`: the target behavior is to build on the active control target unless the service has a reviewed artifact-consumer exception, verify native k3s on the host OS/WSL distro, import the image into native k3s/containerd, apply the existing Kubernetes manifest, stamp the Deployment and wait for rollout. On D601, persistent dev apply uses artifact consumption for `backend-core`, `frontend`, `decision-center`, `mdtodo`, `claudeqq` and dev-only `code-queue` in `unidesk-dev`; production artifact consumers are limited to reviewed services and exclude Code Queue. Normal production services still cannot use a maintenance-channel direct rollout. The executor must use the native kubeconfig and containerd socket, for example `/etc/rancher/k3s/k3s.yaml` and `/run/k3s/containerd/containerd.sock`; running k3s itself in Docker is forbidden for both control-plane and worker nodes. A `rancher/k3s` image or legacy container may only be used as a temporary artifact source during migration, and any active containerized k3s control plane must be stopped before verification succeeds. The executor must preload a valid `rancher/mirrored-pause:3.6` sandbox image into native k3s containerd through the provider-gateway one-shot egress path, verify its entrypoint is `/pause`, and reject fake or sleep-based replacement images. k3s-managed deploys must use ClusterIP Services and Kubernetes API service proxy health checks; they must not add NodePort, hostPort, public business ports or provider-gateway direct business backends.
|
||
|
||
D601 Docker local images are not the source of truth for k3s runtime availability. For Code Queue, the deploy gate must verify `unidesk-code-queue:d601` exists in native k3s containerd after import with `ctr --address /run/k3s/containerd/containerd.sock -n k8s.io images ls`, and it must fail before rollout if the tag is missing. The same gate must verify every production Code Queue Deployment that uses the image (`code-queue`, `code-queue-read`, `code-queue-write`, `d601-provider-egress-proxy`, `d601-tcp-egress-gateway`) still references exactly `unidesk-code-queue:d601`; otherwise kubelet may attempt an external registry pull and leave base gateways in `ImagePullBackOff`.
|
||
|
||
Code Queue health and diagnostics must cover its k3s dependencies, not only scheduler HTTP health. `bun scripts/cli.ts microservice diagnostics code-queue` and the `/health` aggregation must mark the service degraded/failing when `d601-provider-egress-proxy` or `d601-tcp-egress-gateway` Deployment availability or Endpoint readiness is missing, when the scheduler reports `storage.lastError` or PostgreSQL route failure through `d601-tcp-egress-gateway.unidesk.svc.cluster.local:15432`, or when stale active/retry_wait reconcile reports recoverable active tasks without a local run.
|
||
|
||
Existing service-specific commands such as Code Queue deploy are disabled as direct D601 deploy paths. Their build/import/rollout semantics should converge later into one controlled target-side deployment path instead of keeping parallel implementations.
|
||
|
||
Baidu Netdisk is the main-server `unidesk-direct` sample for artifact CD and a dependency of the PGDATA-to-Baidu-Netdisk backup path. Controlled dev validation and prod CD use the D601 registry artifact consumer: it verifies `unidesk/baidu-netdisk:<commit>` exists in the registry, streams the image to the main server through provider-gateway Host SSH, retags `baidu-netdisk` and `baidu-netdisk:<commit>`, stamps `UNIDESK_BAIDU_NETDISK_DEPLOY_*` in the canonical Compose env file, recreates only Compose service `baidu-netdisk`, and verifies container health, image labels, service id, `/health.deploy.commit`, and `/health.auth`. Live apply must fail or return degraded before success if `UNIDESK_BAIDU_NETDISK_CLIENT_ID`, `UNIDESK_BAIDU_NETDISK_CLIENT_SECRET`, or `UNIDESK_BAIDU_NETDISK_TOKEN_KEY` is absent from the controlled env source, or if `/health.auth.configured`, `clientIdConfigured`, `clientSecretConfigured`, `tokenKeyConfigured`, or `loggedIn` is not true after recreate. Dry-run reports `secretSource`, `requiredSecretsPresent`, `missingSecretKeys` and `recommendedAction` so a missing dev secret source is diagnosable before live apply; it must not print secret values. It must not use `server rebuild baidu-netdisk`, mutable tags, dirty worktrees, hand-built images, or public `4244` exposure as deployment truth.
|
||
|
||
For PGDATA-to-Baidu-Netdisk incident review, the no-authorization read-only boundary is limited to `server status`, `schedule list`, `schedule get`, `schedule runs`, `microservice status/health baidu-netdisk`, `microservice proxy baidu-netdisk /api/auth/status --raw`, and `microservice proxy baidu-netdisk '/api/transfers?limit=20' --raw`. These commands may report `failureKind=target-stack-not-running` when `unidesk-backend-core`, `unidesk-database`, or `baidu-netdisk-backend` is absent, especially when only `*.verify-*` containers are visible; that state is an infrastructure blocker, not a successful empty backup history. Recovery actions such as restoring non-empty Baidu secrets, `server start`, `server rebuild backend-core`, `server rebuild baidu-netdisk`, `deploy apply --env prod --service baidu-netdisk`, `schedule run`, or `schedule retry-run` can affect production or trigger a real backup and require explicit operator authorization.
|
||
|
||
Decision Center is a standard `k3sctl-managed` service in this model, but D601 maintenance-channel direct apply must not deploy it. Controlled CD for Decision Center uses the D601 registry artifact consumer in both dev and prod: it verifies `unidesk/decision-center:<commit>` exists in the registry, imports `unidesk-decision-center:<commit>` into native k3s containerd, applies the appropriate Decision Center manifest, stamps the Deployment, and verifies health through `/api/microservices/decision-center/health` while proving the live and requested commit match. It must not add a main-server Compose service, NodePort, hostPort, or provider-gateway direct HTTP backend for Decision Center.
|
||
|
||
MDTODO and ClaudeQQ are standard `k3sctl-managed` artifact consumers in the same model. Dev rollout lands in `unidesk-dev` using their dev manifests; production rollout lands in `unidesk` using the production manifests. Both services must pass dev validation before production rollout, must expose deploy metadata in health when practical, and must verify through the Kubernetes API service proxy instead of NodePort, hostPort or provider-gateway direct HTTP.
|
||
|
||
Code Queue is explicitly narrower. Only `--env dev --service code-queue` is a supported artifact consumer target, and it may mutate only `unidesk-dev` Code Queue execution objects. Production Code Queue artifact deploy, production rollout and production manifest mutation are unsupported and must fail visibly.
|
||
|
||
## Code Queue Production HostPath Guard
|
||
|
||
生产 Code Queue 仍处在 hostPath source 过渡边界。生产 scheduler/read/write Pod 会把 D601 `/home/ubuntu/cq-deploy` 同时挂载为 `/app` 和 `/root/unidesk`,因此 Bun 进程启动时解析的是 hostPath repo,而不是镜像内已 COPY 的源码。即使 Docker build 或 `unidesk-code-queue:d601` 导入成功,只要 `/home/ubuntu/cq-deploy` 部分同步,运行态仍会失败。必须防住的具体故障类是:`index.ts` 已导入 `./runtime-preflight`,但 `/home/ubuntu/cq-deploy/src/components/microservices/code-queue/src/runtime-preflight.ts` 缺失;该状态必须视为 deploy-degraded,并阻止任何 scheduler restart 或 rollout。
|
||
|
||
任何仍会修改生产 `/home/ubuntu/cq-deploy` 的部署或恢复路径,都必须在 source sync 之后、Kubernetes rollout 之前运行 Code Queue source import guard:
|
||
|
||
```bash
|
||
bun scripts/code-queue-source-guard.ts --root /home/ubuntu/cq-deploy
|
||
# or from a local controller worktree:
|
||
bun scripts/cli.ts deploy guard code-queue-source --root /home/ubuntu/cq-deploy
|
||
```
|
||
|
||
guard 必须返回 JSON,并在失败时以非零退出码给出 `degradedReason=source-root-missing` 或 `degradedReason=missing-relative-import-target`;部署编排必须透出该 reason,并在 `kubectl rollout restart` 或任何会迫使 scheduler 重新导入脏 hostPath source 的 Pod 删除之前停止。当前 guard 覆盖 `src/components/microservices/code-queue/src/**/*.ts` 下的相对 `import`、`export ... from` 和 `import(...)` 目标,包括 `runtime-preflight.ts` 这类缺文件故障。
|
||
|
||
路径所有权必须保持显式。`/home/ubuntu/cq-deploy` 是 `src/components/microservices/k3sctl-adapter/k3s/code-queue.k8s.yaml` 使用的生产 k3s hostPath repo。`/home/ubuntu/unidesk-code-queue-deploy` 是历史/开发 worktree 名称;除非 manifest、部署代码和文档一起修改,否则不得假设它是生产 scheduler source。迁移期如果两者通过软链接关联,guard 仍必须对实际挂载进 `/app` 的路径运行。
|
||
|
||
D601 k3s 验证必须始终设置原生 kubeconfig:
|
||
|
||
```bash
|
||
KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n unidesk get deploy,svc,pod,endpoints
|
||
```
|
||
|
||
D601 默认 `kubectl` context 可能指向 Docker Desktop、kind 或其他本地集群,因此不能作为 UniDesk production Code Queue ready 的证据。长期目标是完全移除生产 hostPath source 覆盖,让 Code Queue production 收敛到 commit-pinned artifact/image CD,并像其他已审查 artifact consumer 一样验证 live commit。
|
||
|
||
## CI Separation
|
||
|
||
Continuous integration is intentionally separate from this deploy reconciler. D601 k3s hosts Tekton CI resources described in `docs/reference/ci.md`; PipelineRuns may clone, check, run read-only performance gates, create temporary CI-owned namespaces for dev manifest smoke e2e, or publish commit-pinned backend-core/user-service image artifacts to the D601 artifact registry. They must not call `deploy apply`, `codex deploy`, `kubectl rollout restart` for production services, mutate `deploy.json`, or write production namespaces.
|
||
|
||
Artifact publish preflight is part of CI, not deploy: `artifact-registry status|health` and `ci publish-user-service --dry-run` are the supported read-only checks for registry reachability and user-service publish readiness. These commands must not depend on a coincidentally present local `unidesk-database` container, and when backend-core/database/provider channels are missing they should return structured `infra-blocked` instead of a raw container error.
|
||
|
||
The Code Queue performance gate may create a temporary `code-queue-ci-read` service and read the main PostgreSQL through the existing `d601-tcp-egress-gateway`. Because it runs with `CODE_QUEUE_SERVICE_ROLE=read`, scheduler/backfill/notification disabled and EmptyDir state, it is not deployment truth and does not need a temporary database for the current read-only checks.
|
||
|
||
## Version Stamping And Verification
|
||
|
||
Every successful deployment must stamp the source version in the runtime:
|
||
|
||
- Docker image labels: `unidesk.ai/service-id`, `unidesk.ai/source-repo`, `unidesk.ai/source-commit` and `unidesk.ai/dockerfile`.
|
||
- Runtime env or Kubernetes annotations: `UNIDESK_DEPLOY_SERVICE_ID`, `UNIDESK_DEPLOY_REPO`, `UNIDESK_DEPLOY_COMMIT` and `UNIDESK_DEPLOY_REQUESTED_COMMIT`.
|
||
- Service health response should expose `deploy.repo` and `deploy.commit` when practical. Existing service-specific health contracts such as Code Queue's `deploy.commit` remain valid.
|
||
|
||
The deploy job is not complete until live verification proves the running service matches the requested commit. For Docker services this includes image label inspection on the running container. For k3s services this includes Deployment annotation/env inspection and service health through the Kubernetes API service proxy path for the target ClusterIP Service; production user-service requests continue through the same UniDesk microservice proxy path used by the frontend. A healthy old service must fail verification.
|
||
|
||
## Unsupported Services
|
||
|
||
Image-only services, such as a service declared directly as `docker.io/vendor/image:tag` without a Dockerfile source artifact, do not satisfy target-side source-build policy. They must not be silently converted into Dockerfile source builds. If the object is an approved upstream image exception, it must follow the digest-pinned pull-only model documented in `docs/reference/cicd-standardization.md`; otherwise `deploy check`, `deploy plan` and CD entrypoints should report it as unsupported until a reviewed artifact producer or upstream digest consumer exists.
|