pikasTech-unidesk/docs/reference/deploy.md

# Desired Deploy Reconciler

UniDesk deployment is driven by a desired-state manifest. The manifest answers only one question: which service should run which repository commit. Runtime topology, ports, providers, compose files, Kubernetes manifests, health paths and proxy policy remain in `config.json` and the existing service manifests.

## Manifest

The root `deploy.json` is intentionally minimal:

```json
{
  "schemaVersion": 1,
  "environment": "prod",
  "services": [
    {
      "id": "code-queue",
      "repo": "https://github.com/pikasTech/unidesk",
      "commitId": "0c3cdb4ee06a23361ed511a2da033d67b53d16f4"
    }
  ]
}
```

`environment` is optional only for the legacy local-file compatibility path. When present it must be exactly `dev` or `prod`. Any `--env <name>` command requires the manifest to declare the same `environment`; `--env dev` must reject `environment=prod`, and `--env prod` must reject `environment=dev`.

`deploy.json` must not contain provider IDs, ports, compose service names, Kubernetes namespace, health paths, environment variables, Dockerfile paths or build commands. The deploy reconciler joins each `id` with `config.json.microservices[]` and existing k3s manifests to resolve those details. A service listed in `deploy.json` but missing from `config.json` is an error. A service with no Dockerfile source artifact is reported as unsupported rather than silently skipped. `commitId` may be a unique pushed short SHA or a full SHA; every deploy command resolves it through the remote repository to a full 40-character commit before target-side build or rollout, and fails immediately if the SHA is missing or ambiguous.

Environment mode never reads the local working tree manifest. The mapping is fixed:

- `dev -> origin/deploy/dev`
- `prod -> origin/deploy/prod`

`deploy check --env ...` and `deploy plan --env ...` fetch the fixed ref, read `deploy.json` from that ref, validate the declared environment, and report the manifest commit/blob, service commit IDs, target namespace, database fingerprint and Provider identity without mutating runtime resources. `deploy apply --env dev` is enabled for the current isolated D601 dev slice: `backend-core`, `frontend` and `code-queue`. If no `--service` is given and the dev manifest still includes unsupported later-stage services such as `code-queue-mgr`, the command fails before changing runtime resources. `deploy apply --env prod` remains disabled until the production environment executor and authorization policy are explicitly added.

The `deploy/dev` and `deploy/prod` branches are environment desired-state branches, not source branches. They should contain only `deploy.json`; Kubernetes manifests, Dockerfiles and executor code continue to live on `master` and are selected through the commit IDs declared in the environment manifest.

`config.json.microservices[].repository.commitId` is retained for catalog compatibility, but `deploy.json` is the deployment version authority for the reconciler.

## D601 Dev Foundation

Phase 2 of the D601 dev environment creates only the isolated namespace and database foundation. The authoritative manifest is `src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-foundation.k8s.yaml`.

It may create resources only in `unidesk-dev`:

- `Namespace unidesk-dev`, plus quota and default limits.
- `Secret unidesk-dev-runtime-secrets` as a dev-only template for DB credentials, provider token, auth/session secret, and Code Queue model secret placeholders.
- `ConfigMap unidesk-dev-runtime-config` for dev identity, fixed deploy ref `origin/deploy/dev`, provider id `D601-dev`, Code Queue dev paths, and non-secret runtime defaults.
- `ConfigMap unidesk-dev-db-guard` with an executable guard script that rejects production-looking `DATABASE_URL` values.
- `StatefulSet/Service postgres-dev` with a 5Gi persistent volume claim and bounded CPU/memory requests/limits.
- `Job unidesk-dev-db-migrate`, which waits for `postgres-dev`, runs the guard, then prepares backend-core and Code Queue tables in the independent `unidesk_dev` database.

The manifest must not create, update, or delete production namespace resources, production DB objects, production PVCs, production Deployments/Services/Secrets, or main server Docker Compose services. Static validation is available through `bun scripts/cli.ts dev-env validate`; Kubernetes client dry-run is `bun scripts/cli.ts dev-env validate --kubectl-dry-run`. If applying manually during Phase 2, the only allowed apply target is this manifest and the post-check must prove production resources are unchanged, for example by comparing `kubectl -n unidesk get deploy,sts,svc,secret,pvc -o name` before and after.

Before applying the foundation on a fresh D601 native k3s runtime, run `bun scripts/cli.ts dev-env prewarm-images` and wait for the returned job to succeed. This imports the foundation images `postgres:16-alpine` and `rancher/mirrored-library-busybox:1.36.1` from Docker into `/run/k3s/containerd/containerd.sock`; k3s/containerd must not depend on live Docker Hub pulls during rollout. If this step is skipped, `postgres-dev` or the local-path helper pod can remain `ImagePullBackOff`, leaving the PVC pending even though the manifest is valid.

Phase 2 guardrails are deliberately limited to the dev manifest and CLI validator. Runtime startup guards for dev backend-core, Code Queue and Code Queue Manager must be reviewed and shipped as a separate change before dev workloads are exposed beyond dry-run or controlled apply.

On D601, dev/prod k3s verification must use the native k3s kubeconfig explicitly: `KUBECONFIG=/etc/rancher/k3s/k3s.yaml`. The default `kubectl` context may point at Docker Desktop and is not an acceptable target for UniDesk k3s deploy validation.

## D601 Dev Core

Phase 3 introduces the dev backend/frontend manifest at `src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`. It may create only `backend-core-dev` and `frontend-dev` Deployment/Service objects in `unidesk-dev`.

`backend-core-dev` must use `unidesk-dev-runtime-config` and `unidesk-dev-runtime-secrets`, connect to `postgres-dev.../unidesk_dev`, expose HTTP on 8080 and provider ingress on 8081, and write logs under `/var/log/unidesk-dev`. `frontend-dev` must set `CORE_INTERNAL_URL=http://backend-core-dev.unidesk-dev.svc.cluster.local:8080` and must not proxy to production backend-core.

The manifest keeps placeholder image tags and deploy commit values in source control. `deploy apply --env dev --service backend-core|frontend` fetches `origin/deploy/dev:deploy.json`, materializes the requested source commit on D601, copies the dev core control manifest, narrows it to the selected Service/Deployment pair, replaces placeholders with the requested commit and dev image tag, builds on D601, imports the image into native k3s containerd, applies only the `unidesk-dev` objects and stamps the Deployment. Client dry-run and static validation are the required checks before any controlled apply:

- `bun scripts/cli.ts dev-env validate --manifest src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`
- `KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply --dry-run=client --validate=false -f src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`

backend-core and frontend keep their production health payload shape by default. They add `environment`, `namespace`, `databaseName`, `serviceId`, `deployRef` and deploy commit metadata only when `UNIDESK_ENV=dev` or `UNIDESK_NAMESPACE=unidesk-dev` is set. The frontend shell shows a visible DEV ribbon only under the same dev identity.

## D601 Dev Code Queue

Phase 5 introduces the dev Code Queue execution manifest at `src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-code-queue.k8s.yaml`. It may create only Code Queue dev execution objects in `unidesk-dev`: `code-queue-scheduler-dev`, `code-queue-read-dev`, `code-queue-write-dev` and the supporting `d601-dev-provider-egress-proxy`.

All dev Code Queue components must use `unidesk-dev-runtime-config` and `unidesk-dev-runtime-secrets`, connect to `postgres-dev.../unidesk_dev`, write logs and state under `/home/ubuntu/unidesk-dev-code-queue-deploy/state`, and expose HTTP on 4222 only as ClusterIP services. The scheduler uses `CODE_QUEUE_MAIN_PROVIDER_ID=D601-dev`, `CODE_QUEUE_WORKDIR=/workspace-dev`, `CODE_QUEUE_REMOTE_WORKDIR=/home/ubuntu/unidesk-dev-workspace`, disables ClaudeQQ notifications by default, and does not use the production `d601-tcp-egress-gateway` or production PostgreSQL route.

`deploy apply --env dev --service code-queue` fetches `origin/deploy/dev:deploy.json`, materializes the requested source commit on D601, uses the dev Code Queue control manifest from that D601 materialized commit, narrows it to Code Queue dev objects, replaces placeholders with the requested commit and `unidesk-code-queue:dev`, builds on D601, imports the image into native k3s containerd, applies only `unidesk-dev` objects and stamps the dev Deployments. Because Code Queue carries the agent toolchain and browser/runtime dependencies, dev builds may reuse an existing D601 `unidesk-code-queue:d601-build-base` or `unidesk-code-queue:d601` image when the dev build-base tag is absent, and the deploy executor allows a longer Code Queue build window than lightweight services. The scheduler has an explicit 5Gi memory limit, and all dev Code Queue containers must set CPU limits so the namespace `LimitRange` does not inject a quota-breaking default CPU limit. This first dev execution slice proves deployability, health and dev database isolation; wiring the dev frontend stable `code-queue` route through a dev `code-queue-mgr` is a separate later phase.

## CLI

`bun scripts/cli.ts deploy check [--file deploy.json] [--service <id>]` checks the live runtime against the desired repo and commit without changing the system.

`bun scripts/cli.ts deploy plan [--file deploy.json] [--service <id>]` prints the same live state plus the intended action: `noop`, `deploy` or `unsupported`.

`bun scripts/cli.ts deploy plan --env dev [--service <id>]` reads `origin/deploy/dev:deploy.json` and prints a dry-run environment plan without checking or mutating live runtime resources. `deploy check --env dev` uses the same dry-run environment plan. `--env prod` is available for parity as a dry-run planning path; it reads `origin/deploy/prod:deploy.json` and must not use a dirty local `deploy.json`.

`bun scripts/cli.ts deploy apply [--file deploy.json | --env dev] [--service <id>] [--dry-run] [--force]` starts an asynchronous job. Use `bun scripts/cli.ts job status <jobId> --tail-bytes 30000` to observe progress. `--dry-run` resolves the same plan but does not build or replace runtime objects. `--force` rebuilds even when the live commit matches. Environment apply currently supports `--env dev --service backend-core`, `--env dev --service frontend` and `--env dev --service code-queue`; `--env prod` apply is rejected.

All deploy commands output JSON. Long operations must use `.state/jobs/` and bounded log tails; no deploy path may succeed with missing progress output.

## Target-Side Build

Target-side build is the only standard deployment mode. The controller may run on the main server, but source materialization, compile/build, Docker image creation and deployment happen on the target node that will run the service.

- Main server services are fetched, built and deployed on the main server.
- D601 services are fetched, built and deployed on D601.
- D518 services are fetched, built and deployed on D518.
- k3s managed services are built on the active control target and then imported into that target's Kubernetes container runtime.

The reconciler distributes only repository URL, commit ID, Dockerfile path, build context and the existing deployment manifest/compose declaration. It must not distribute large Docker images between hosts as the default path, and it must not accept `docker commit` images, dirty worktrees or hand-mutated runtime containers as deployment truth.

Each target fetches the remote repository, resolves the requested commit to a full 40 character SHA and exports tracked files with `git archive`. Build contexts are created from that archive, not from the operator's current working tree. Environment applies such as `deploy apply --env dev` must not upload Kubernetes manifests or source files from the master server worktree; the target-side materialized commit is the source for Dockerfile, build context and k3s control manifests. The master server side may only do lightweight CLI orchestration, environment ref reading and remote command dispatch.

## One-Shot Build Proxy

Target-side source fetches and Docker builds that need external network access use a one-shot proxy scope through provider-gateway WS egress. Provider targets connect only to their node-local provider-gateway egress endpoint, normally `http://127.0.0.1:18789`; provider-gateway carries the TCP stream over the already-authenticated provider WebSocket to the main server, and the main server opens the final outbound TCP connection. This is the only allowed proxy channel for provider-side deploy source fetches and builds. The deploy path must not mutate host-global proxy settings:

- Do not edit `/etc/docker/daemon.json`.
- Do not edit shell profiles or global Docker CLI config.
- Do not leave long-lived host `HTTP_PROXY`, `HTTPS_PROXY` or `ALL_PROXY`.
- Do not silently fall back to target local direct internet.
- Do not create a separate SSH SOCKS proxy, public master proxy port, or direct backend-core/provider-ingress connection for deploy egress.

The standard implementation first probes GitHub through the node-local egress proxy, then runs target-side `git clone`/`git fetch` and the Docker build in that scoped environment. It also uses the target Docker daemon's local BuildKit builder so target-side base image and layer caches are reused. Proxy variables are scoped to the current deploy step and passed as matching `--build-arg` values for Dockerfile `RUN` steps; they are not written to daemon or shell configuration. Provider targets also use `docker buildx build --network host` so `127.0.0.1:<proxy-port>` inside `RUN` resolves to the target host's loopback provider-gateway egress proxy. Each deploy must log the proxy channel and probe result, for example `target_source_proxy=provider-gateway-ws-egress:http://127.0.0.1:18789`, `target_build_proxy=provider-gateway-ws-egress:http://127.0.0.1:18789` and `target_build_proxy_probe=ok`.

Build cache is part of the deployment contract, not an optimization left to Docker defaults. The deploy reconciler must pass inline BuildKit cache metadata (`--cache-to type=inline`) and import the current target image as cache source when it exists (`--cache-from <image>`). Dockerfiles that intentionally expose a warm build-base argument, such as Code Queue's `CODE_QUEUE_BASE_IMAGE`, may use the target-local `<image>-build-base` image to avoid re-running large apt/npm/Playwright setup layers; this is still target-local build cache and must be logged as `target_build_base_image=<image>-build-base`. If a service later needs an isolated `docker-container` builder or a local cache directory backend, it may use one only as a service-specific fallback and must still log proxy resolution, proxy probe result, cache source, cache destination and builder cleanup. The default path must not discard target-local image cache by creating a fresh builder for every deploy.

Main server targets may build without a proxy unless a service explicitly requires one. Provider targets must not bypass provider-gateway WS egress for GitHub, Debian apt, npm, Playwright, model downloads or any other external build dependency.

## Deployment Executors

The reconciler selects the executor from `config.json`:

- `deployment.mode=unidesk-direct` on `main-server`: build the image on the main server, then use the fixed UniDesk Compose project and `up -d --no-build --no-deps --force-recreate <service>`.
- `deployment.mode=internal-sidecar` on `main-server`: use the same main-server target-side source export, Docker build, image label stamping, fixed Compose project replacement and live commit verification as direct Compose services. This class is for private sidecars such as `code-queue-mgr`; it is still versioned by `deploy.json.commitId`, not by the operator's current worktree.
- `deployment.mode=unidesk-direct` on a provider: dispatch `host.ssh` to that provider, build on the provider, then use the service's provider-local compose file and project. The executor resolves the actual Compose project, image name, build context, Dockerfile and target from the running container labels and `docker compose config`; it must not guess an image tag that the service will not actually run.
- Control bridges that UniDesk needs in order to inspect or repair an orchestrator must stay in this direct class. In particular, `k3sctl-adapter` is a UniDesk-managed bridge to native k3s and must remain outside k3s; Docker packaging on Docker Desktop/WSL must create an explicit host-local bridge, currently an adapter-container SSH local tunnel, to reach `/etc/rancher/k3s/k3s.yaml` and WSL `127.0.0.1:6443`.
- `deployment.mode=k3sctl-managed`: dispatch to the active control target, build on that target, verify or install native k3s on the host OS/WSL distro, import the image into native k3s/containerd, apply the existing Kubernetes manifest, stamp the Deployment and wait for rollout. The executor must use the native kubeconfig and containerd socket, for example `/etc/rancher/k3s/k3s.yaml` and `/run/k3s/containerd/containerd.sock`; running k3s itself in Docker is forbidden for both control-plane and worker nodes. A `rancher/k3s` image or legacy container may only be used as a temporary artifact source during migration, and any active containerized k3s control plane must be stopped before verification succeeds. The executor must preload a valid `rancher/mirrored-pause:3.6` sandbox image into native k3s containerd through the provider-gateway one-shot egress path, verify its entrypoint is `/pause`, and reject fake or sleep-based replacement images. Code Queue's k3s migration executor must also stop/remove the legacy direct Docker `code-queue-backend` after k3s rollout, so there is never a second scheduler running beside the native k3s scheduler.

Existing service-specific commands such as Code Queue deploy should converge onto this reconciler path instead of keeping a parallel implementation.

Decision Center is a standard `k3sctl-managed` service in this model. `deploy apply --service decision-center` must build `src/components/microservices/decision-center/Dockerfile` on D601, import `unidesk-decision-center:d601` into native k3s containerd, apply `src/components/microservices/k3sctl-adapter/k3s/decision-center.k8s.yaml`, stamp the Deployment, and verify health through `/api/microservices/decision-center/health`. It must not add a main-server Compose service, NodePort, hostPort, or provider-gateway direct HTTP backend for Decision Center.

## CI Separation

Continuous integration is intentionally separate from this deploy reconciler. D601 k3s hosts Tekton CI resources described in `docs/reference/ci.md`, but those PipelineRuns only clone, check and run read-only performance gates. They must not call `deploy apply`, `codex deploy`, `kubectl rollout restart` for production services, or mutate `deploy.json`.

The Code Queue performance gate may create a temporary `code-queue-ci-read` service and read the main PostgreSQL through the existing `d601-tcp-egress-gateway`. Because it runs with `CODE_QUEUE_SERVICE_ROLE=read`, scheduler/backfill/notification disabled and EmptyDir state, it is not deployment truth and does not need a temporary database for the current read-only checks.

## Version Stamping And Verification

Every successful deployment must stamp the source version in the runtime:

- Docker image labels: `unidesk.ai/service-id`, `unidesk.ai/source-repo`, `unidesk.ai/source-commit` and `unidesk.ai/dockerfile`.
- Runtime env or Kubernetes annotations: `UNIDESK_DEPLOY_SERVICE_ID`, `UNIDESK_DEPLOY_REPO`, `UNIDESK_DEPLOY_COMMIT` and `UNIDESK_DEPLOY_REQUESTED_COMMIT`.
- Service health response should expose `deploy.repo` and `deploy.commit` when practical. Existing service-specific health contracts such as Code Queue's `deploy.commit` remain valid.

The deploy job is not complete until live verification proves the running service matches the requested commit. For Docker services this includes image label inspection on the running container. For k3s services this includes Deployment annotation/env inspection and service health through the same UniDesk microservice proxy path used by the frontend. A healthy old service must fail verification.

## Unsupported Services

Image-only services, such as a service declared directly as `docker.io/vendor/image:tag` without a Dockerfile source artifact, do not satisfy target-side build policy. They must be converted to a source repository with a Dockerfile wrapper before the reconciler can manage them. Until then, `deploy check` and `deploy plan` should report them as unsupported.