Files
pikasTech-unidesk/docs/reference/deploy.md
T
2026-05-18 01:31:58 +00:00

20 KiB

Desired Deploy Reconciler

UniDesk deployment is driven by a desired-state manifest. The manifest answers only one question: which service should run which repository commit. Runtime topology, ports, providers, compose files, Kubernetes manifests, health paths and proxy policy remain in config.json and the existing service manifests.

Manifest

The root deploy.json is intentionally minimal:

{
  "schemaVersion": 1,
  "environment": "prod",
  "services": [
    {
      "id": "code-queue",
      "repo": "https://github.com/pikasTech/unidesk",
      "commitId": "0c3cdb4ee06a23361ed511a2da033d67b53d16f4"
    }
  ]
}

environment is optional only for the legacy local-file compatibility path. When present it must be exactly dev or prod. Any --env <name> command requires the manifest to declare the same environment; --env dev must reject environment=prod, and --env prod must reject environment=dev.

deploy.json must not contain provider IDs, ports, compose service names, Kubernetes namespace, health paths, environment variables, Dockerfile paths or build commands. The deploy reconciler joins each id with config.json.microservices[] and existing k3s manifests to resolve those details. A service listed in deploy.json but missing from config.json is an error. A service with no Dockerfile source artifact is reported as unsupported rather than silently skipped. commitId may be a unique pushed short SHA or a full SHA; every deploy command resolves it through the remote repository to a full 40-character commit before target-side build or rollout, and fails immediately if the SHA is missing or ambiguous.

Environment mode never reads the local working tree manifest. The mapping is fixed:

  • dev -> origin/deploy/dev
  • prod -> origin/deploy/prod

deploy check --env ... and deploy plan --env ... fetch the fixed ref, read deploy.json from that ref, validate the declared environment, and report the manifest commit/blob, service commit IDs, target namespace, database fingerprint and Provider identity without mutating runtime resources. deploy apply --env dev is enabled for the current isolated D601 dev slice: backend-core, frontend and code-queue. If no --service is given and the dev manifest still includes unsupported later-stage services such as code-queue-mgr, the command fails before changing runtime resources. deploy apply --env prod remains disabled until the production environment executor and authorization policy are explicitly added.

The deploy/dev and deploy/prod branches are environment desired-state branches, not source branches. They should contain only deploy.json; Kubernetes manifests, Dockerfiles and executor code continue to live on master and are selected through the commit IDs declared in the environment manifest.

config.json.microservices[].repository.commitId is retained for catalog compatibility, but deploy.json is the deployment version authority for the reconciler.

D601 Dev Foundation

Phase 2 of the D601 dev environment creates only the isolated namespace and database foundation. The authoritative manifest is src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-foundation.k8s.yaml.

It may create resources only in unidesk-dev:

  • Namespace unidesk-dev, plus quota and default limits.
  • Secret unidesk-dev-runtime-secrets as a dev-only template for DB credentials, provider token, auth/session secret, and Code Queue model secret placeholders.
  • ConfigMap unidesk-dev-runtime-config for dev identity, fixed deploy ref origin/deploy/dev, provider id D601-dev, Code Queue dev paths, and non-secret runtime defaults.
  • ConfigMap unidesk-dev-db-guard with an executable guard script that rejects production-looking DATABASE_URL values.
  • StatefulSet/Service postgres-dev with a 5Gi persistent volume claim and bounded CPU/memory requests/limits.
  • Job unidesk-dev-db-migrate, which waits for postgres-dev, runs the guard, then prepares backend-core and Code Queue tables in the independent unidesk_dev database.

The manifest must not create, update, or delete production namespace resources, production DB objects, production PVCs, production Deployments/Services/Secrets, or main server Docker Compose services. Static validation is available through bun scripts/cli.ts dev-env validate; Kubernetes client dry-run is bun scripts/cli.ts dev-env validate --kubectl-dry-run. If applying manually during Phase 2, the only allowed apply target is this manifest and the post-check must prove production resources are unchanged, for example by comparing kubectl -n unidesk get deploy,sts,svc,secret,pvc -o name before and after.

Before applying the foundation on a fresh D601 native k3s runtime, run bun scripts/cli.ts dev-env prewarm-images and wait for the returned job to succeed. This imports the foundation images postgres:16-alpine and rancher/mirrored-library-busybox:1.36.1 from Docker into /run/k3s/containerd/containerd.sock; k3s/containerd must not depend on live Docker Hub pulls during rollout. If this step is skipped, postgres-dev or the local-path helper pod can remain ImagePullBackOff, leaving the PVC pending even though the manifest is valid.

Phase 2 guardrails are deliberately limited to the dev manifest and CLI validator. Runtime startup guards for dev backend-core, Code Queue and Code Queue Manager must be reviewed and shipped as a separate change before dev workloads are exposed beyond dry-run or controlled apply.

On D601, dev/prod k3s verification must use the native k3s kubeconfig explicitly: KUBECONFIG=/etc/rancher/k3s/k3s.yaml. The default kubectl context may point at Docker Desktop and is not an acceptable target for UniDesk k3s deploy validation.

D601 Dev Core

Phase 3 introduces the dev backend/frontend manifest at src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml. It may create only backend-core-dev and frontend-dev Deployment/Service objects in unidesk-dev.

backend-core-dev must use unidesk-dev-runtime-config and unidesk-dev-runtime-secrets, connect to postgres-dev.../unidesk_dev, expose HTTP on 8080 and provider ingress on 8081, and write logs under /var/log/unidesk-dev. frontend-dev must set CORE_INTERNAL_URL=http://backend-core-dev.unidesk-dev.svc.cluster.local:8080 and must not proxy to production backend-core.

The manifest keeps placeholder image tags and deploy commit values in source control. deploy apply --env dev --service backend-core|frontend fetches origin/deploy/dev:deploy.json, materializes the requested source commit on D601, copies the dev core control manifest, narrows it to the selected Service/Deployment pair, replaces placeholders with the requested commit and dev image tag, builds on D601, imports the image into native k3s containerd, applies only the unidesk-dev objects and stamps the Deployment. Client dry-run and static validation are the required checks before any controlled apply:

  • bun scripts/cli.ts dev-env validate --manifest src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml
  • KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply --dry-run=client --validate=false -f src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml

backend-core and frontend keep their production health payload shape by default. They add environment, namespace, databaseName, serviceId, deployRef and deploy commit metadata only when UNIDESK_ENV=dev or UNIDESK_NAMESPACE=unidesk-dev is set. The frontend shell shows a visible DEV ribbon only under the same dev identity.

D601 Dev Code Queue

Phase 5 introduces the dev Code Queue execution manifest at src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-code-queue.k8s.yaml. It may create only Code Queue dev execution objects in unidesk-dev: code-queue-scheduler-dev, code-queue-read-dev, code-queue-write-dev and the supporting d601-dev-provider-egress-proxy.

All dev Code Queue components must use unidesk-dev-runtime-config and unidesk-dev-runtime-secrets, connect to postgres-dev.../unidesk_dev, write logs and state under /home/ubuntu/unidesk-dev-code-queue-deploy/state, and expose HTTP on 4222 only as ClusterIP services. The scheduler uses CODE_QUEUE_MAIN_PROVIDER_ID=D601-dev, CODE_QUEUE_WORKDIR=/workspace-dev, CODE_QUEUE_REMOTE_WORKDIR=/home/ubuntu/unidesk-dev-workspace, disables ClaudeQQ notifications by default, and does not use the production d601-tcp-egress-gateway or production PostgreSQL route.

deploy apply --env dev --service code-queue fetches origin/deploy/dev:deploy.json, materializes the requested source commit on D601, copies the dev Code Queue control manifest, narrows it to Code Queue dev objects, replaces placeholders with the requested commit and unidesk-code-queue:dev, builds on D601, imports the image into native k3s containerd, applies only unidesk-dev objects and stamps the dev Deployments. This first dev execution slice proves deployability, health and dev database isolation; wiring the dev frontend stable code-queue route through a dev code-queue-mgr is a separate later phase.

CLI

bun scripts/cli.ts deploy check [--file deploy.json] [--service <id>] checks the live runtime against the desired repo and commit without changing the system.

bun scripts/cli.ts deploy plan [--file deploy.json] [--service <id>] prints the same live state plus the intended action: noop, deploy or unsupported.

bun scripts/cli.ts deploy plan --env dev [--service <id>] reads origin/deploy/dev:deploy.json and prints a dry-run environment plan without checking or mutating live runtime resources. deploy check --env dev uses the same dry-run environment plan. --env prod is available for parity as a dry-run planning path; it reads origin/deploy/prod:deploy.json and must not use a dirty local deploy.json.

bun scripts/cli.ts deploy apply [--file deploy.json | --env dev] [--service <id>] [--dry-run] [--force] starts an asynchronous job. Use bun scripts/cli.ts job status <jobId> --tail-bytes 30000 to observe progress. --dry-run resolves the same plan but does not build or replace runtime objects. --force rebuilds even when the live commit matches. Environment apply currently supports --env dev --service backend-core, --env dev --service frontend and --env dev --service code-queue; --env prod apply is rejected.

All deploy commands output JSON. Long operations must use .state/jobs/ and bounded log tails; no deploy path may succeed with missing progress output.

Target-Side Build

Target-side build is the only standard deployment mode. The controller may run on the main server, but source materialization, compile/build, Docker image creation and deployment happen on the target node that will run the service.

  • Main server services are fetched, built and deployed on the main server.
  • D601 services are fetched, built and deployed on D601.
  • D518 services are fetched, built and deployed on D518.
  • k3s managed services are built on the active control target and then imported into that target's Kubernetes container runtime.

The reconciler distributes only repository URL, commit ID, Dockerfile path, build context and the existing deployment manifest/compose declaration. It must not distribute large Docker images between hosts as the default path, and it must not accept docker commit images, dirty worktrees or hand-mutated runtime containers as deployment truth.

Each target fetches the remote repository, resolves the requested commit to a full 40 character SHA and exports tracked files with git archive. Build contexts are created from that archive, not from the operator's current working tree.

One-Shot Build Proxy

Target-side source fetches and Docker builds that need external network access use a one-shot proxy scope through provider-gateway WS egress. Provider targets connect only to their node-local provider-gateway egress endpoint, normally http://127.0.0.1:18789; provider-gateway carries the TCP stream over the already-authenticated provider WebSocket to the main server, and the main server opens the final outbound TCP connection. This is the only allowed proxy channel for provider-side deploy source fetches and builds. The deploy path must not mutate host-global proxy settings:

  • Do not edit /etc/docker/daemon.json.
  • Do not edit shell profiles or global Docker CLI config.
  • Do not leave long-lived host HTTP_PROXY, HTTPS_PROXY or ALL_PROXY.
  • Do not silently fall back to target local direct internet.
  • Do not create a separate SSH SOCKS proxy, public master proxy port, or direct backend-core/provider-ingress connection for deploy egress.

The standard implementation first probes GitHub through the node-local egress proxy, then runs target-side git clone/git fetch and the Docker build in that scoped environment. It also uses the target Docker daemon's local BuildKit builder so target-side base image and layer caches are reused. Proxy variables are scoped to the current deploy step and passed as matching --build-arg values for Dockerfile RUN steps; they are not written to daemon or shell configuration. Provider targets also use docker buildx build --network host so 127.0.0.1:<proxy-port> inside RUN resolves to the target host's loopback provider-gateway egress proxy. Each deploy must log the proxy channel and probe result, for example target_source_proxy=provider-gateway-ws-egress:http://127.0.0.1:18789, target_build_proxy=provider-gateway-ws-egress:http://127.0.0.1:18789 and target_build_proxy_probe=ok.

Build cache is part of the deployment contract, not an optimization left to Docker defaults. The deploy reconciler must pass inline BuildKit cache metadata (--cache-to type=inline) and import the current target image as cache source when it exists (--cache-from <image>). Dockerfiles that intentionally expose a warm build-base argument, such as Code Queue's CODE_QUEUE_BASE_IMAGE, may use the target-local <image>-build-base image to avoid re-running large apt/npm/Playwright setup layers; this is still target-local build cache and must be logged as target_build_base_image=<image>-build-base. If a service later needs an isolated docker-container builder or a local cache directory backend, it may use one only as a service-specific fallback and must still log proxy resolution, proxy probe result, cache source, cache destination and builder cleanup. The default path must not discard target-local image cache by creating a fresh builder for every deploy.

Main server targets may build without a proxy unless a service explicitly requires one. Provider targets must not bypass provider-gateway WS egress for GitHub, Debian apt, npm, Playwright, model downloads or any other external build dependency.

Deployment Executors

The reconciler selects the executor from config.json:

  • deployment.mode=unidesk-direct on main-server: build the image on the main server, then use the fixed UniDesk Compose project and up -d --no-build --no-deps --force-recreate <service>.
  • deployment.mode=internal-sidecar on main-server: use the same main-server target-side source export, Docker build, image label stamping, fixed Compose project replacement and live commit verification as direct Compose services. This class is for private sidecars such as code-queue-mgr; it is still versioned by deploy.json.commitId, not by the operator's current worktree.
  • deployment.mode=unidesk-direct on a provider: dispatch host.ssh to that provider, build on the provider, then use the service's provider-local compose file and project. The executor resolves the actual Compose project, image name, build context, Dockerfile and target from the running container labels and docker compose config; it must not guess an image tag that the service will not actually run.
  • Control bridges that UniDesk needs in order to inspect or repair an orchestrator must stay in this direct class. In particular, k3sctl-adapter is a UniDesk-managed bridge to native k3s and must remain outside k3s; Docker packaging on Docker Desktop/WSL must create an explicit host-local bridge, currently an adapter-container SSH local tunnel, to reach /etc/rancher/k3s/k3s.yaml and WSL 127.0.0.1:6443.
  • deployment.mode=k3sctl-managed: dispatch to the active control target, build on that target, verify or install native k3s on the host OS/WSL distro, import the image into native k3s/containerd, apply the existing Kubernetes manifest, stamp the Deployment and wait for rollout. The executor must use the native kubeconfig and containerd socket, for example /etc/rancher/k3s/k3s.yaml and /run/k3s/containerd/containerd.sock; running k3s itself in Docker is forbidden for both control-plane and worker nodes. A rancher/k3s image or legacy container may only be used as a temporary artifact source during migration, and any active containerized k3s control plane must be stopped before verification succeeds. The executor must preload a valid rancher/mirrored-pause:3.6 sandbox image into native k3s containerd through the provider-gateway one-shot egress path, verify its entrypoint is /pause, and reject fake or sleep-based replacement images. Code Queue's k3s migration executor must also stop/remove the legacy direct Docker code-queue-backend after k3s rollout, so there is never a second scheduler running beside the native k3s scheduler.

Existing service-specific commands such as Code Queue deploy should converge onto this reconciler path instead of keeping a parallel implementation.

Decision Center is a standard k3sctl-managed service in this model. deploy apply --service decision-center must build src/components/microservices/decision-center/Dockerfile on D601, import unidesk-decision-center:d601 into native k3s containerd, apply src/components/microservices/k3sctl-adapter/k3s/decision-center.k8s.yaml, stamp the Deployment, and verify health through /api/microservices/decision-center/health. It must not add a main-server Compose service, NodePort, hostPort, or provider-gateway direct HTTP backend for Decision Center.

CI Separation

Continuous integration is intentionally separate from this deploy reconciler. D601 k3s hosts Tekton CI resources described in docs/reference/ci.md, but those PipelineRuns only clone, check and run read-only performance gates. They must not call deploy apply, codex deploy, kubectl rollout restart for production services, or mutate deploy.json.

The Code Queue performance gate may create a temporary code-queue-ci-read service and read the main PostgreSQL through the existing d601-tcp-egress-gateway. Because it runs with CODE_QUEUE_SERVICE_ROLE=read, scheduler/backfill/notification disabled and EmptyDir state, it is not deployment truth and does not need a temporary database for the current read-only checks.

Version Stamping And Verification

Every successful deployment must stamp the source version in the runtime:

  • Docker image labels: unidesk.ai/service-id, unidesk.ai/source-repo, unidesk.ai/source-commit and unidesk.ai/dockerfile.
  • Runtime env or Kubernetes annotations: UNIDESK_DEPLOY_SERVICE_ID, UNIDESK_DEPLOY_REPO, UNIDESK_DEPLOY_COMMIT and UNIDESK_DEPLOY_REQUESTED_COMMIT.
  • Service health response should expose deploy.repo and deploy.commit when practical. Existing service-specific health contracts such as Code Queue's deploy.commit remain valid.

The deploy job is not complete until live verification proves the running service matches the requested commit. For Docker services this includes image label inspection on the running container. For k3s services this includes Deployment annotation/env inspection and service health through the same UniDesk microservice proxy path used by the frontend. A healthy old service must fail verification.

Unsupported Services

Image-only services, such as a service declared directly as docker.io/vendor/image:tag without a Dockerfile source artifact, do not satisfy target-side build policy. They must be converted to a source repository with a Dockerfile wrapper before the reconciler can manage them. Until then, deploy check and deploy plan should report them as unsupported.