Files

T

Codex b265274750 feat: add devops-controlled dev ci flow

2026-05-18 06:59:51 +00:00

24 KiB

Raw Blame History

Desired Deploy Reconciler

UniDesk deployment is driven by a desired-state manifest. The manifest answers only one question: which service should run which repository commit. Runtime topology, ports, providers, compose files, Kubernetes manifests, health paths and proxy policy remain in config.json and the existing service manifests.

Manifest

The root deploy.json is the single desired-state source for both prod and dev. Environment branches such as deploy/dev and deploy/prod are deprecated because they create a second control plane for version intent.

{
  "schemaVersion": 2,
  "environments": {
    "prod": {
      "services": [
        {
          "id": "code-queue",
          "repo": "https://github.com/pikasTech/unidesk",
          "commitId": "0c3cdb4ee06a23361ed511a2da033d67b53d16f4"
        }
      ]
    },
    "dev": {
      "services": [
        {
          "id": "backend-core",
          "repo": "https://github.com/pikasTech/unidesk",
          "commitId": "348c644"
        }
      ]
    }
  }
}

schemaVersion=1 remains accepted only as a local compatibility format. Standard environment commands use schemaVersion=2 and select environments.dev.services or environments.prod.services.

deploy.json must not contain provider IDs, ports, compose service names, Kubernetes namespace, health paths, environment variables, Dockerfile paths or build commands. The deploy reconciler joins each id with config.json.microservices[] and existing k3s manifests to resolve those details. A service listed in deploy.json but missing from config.json is an error. A service with no Dockerfile source artifact is reported as unsupported rather than silently skipped. commitId may be a unique pushed short SHA or a full SHA; every deploy command resolves it through the remote repository to a full 40-character commit before target-side build or rollout, and fails immediately if the SHA is missing or ambiguous.

Environment mode never reads the local dirty working tree manifest. deploy check --env ..., deploy plan --env ... and deploy apply --env ... fetch origin/master, read origin/master:deploy.json, select environments.<env>, and report the manifest commit/blob, service commit IDs, target namespace, database fingerprint and Provider identity. Maintenance-channel direct D601 apply is intentionally narrow: only deploy apply --env dev --service devops may use that path, and only for DevOps bootstrap, repair or break-glass recovery. deploy apply --env dev --service backend-core|frontend|code-queue and local-manifest D601 service apply are rejected before runtime mutation; those services must be deployed by the DevOps control plane after it is healthy. deploy apply --env prod remains disabled until the production environment executor and authorization policy are explicitly added.

config.json.microservices[].repository.commitId is retained for catalog compatibility, but deploy.json is the deployment version authority for the reconciler.

DevOps Bootstrap

DevOps has an intentional first-install bootstrap path to avoid a circular dependency where the service that should deploy CI/CD must already exist before it can deploy itself.

The only supported first-install shape is a one-shot D601-side script:

tmp=$(mktemp) && curl -fsSL https://raw.githubusercontent.com/pikasTech/unidesk/master/scripts/bootstrap/devops-install.sh -o "$tmp" && sudo bash "$tmp" --commit <unidesk-commit-id> --env dev

The bootstrapper may use D601 local shell, native SSH or provider-gateway Host SSH as a maintenance bridge, but only for DevOps bootstrap, repair and break-glass recovery. This maintenance bridge must not deploy backend-core, frontend, Code Queue, Decision Center, k3sctl-adapter or any other direct/managed microservice. It must run source fetch, Go build, Docker build, k3s image import and Kubernetes apply on D601. The main server must not compile Go/Rust or build DevOps images for D601.

The bootstrapper is deliberately narrow and idempotent:

Verify D601 native k3s and /etc/rancher/k3s/k3s.yaml.
Clone or fetch the UniDesk repo on D601 and checkout the requested commit.
Build src/components/microservices/devops/Dockerfile on D601.
Import unidesk-devops:dev into native k3s containerd.
Apply src/components/microservices/k3sctl-adapter/k3s/devops.k8s.yaml into unidesk-ci.
Wait for deployment/devops rollout and /health.
Write a local bootstrap receipt with repo, requested commit, resolved commit, namespace, image and health result.

After DevOps is healthy, normal CI/CD control should move to CLI -> backend-core -> k3sctl-adapter -> DevOps -> Kubernetes API/Tekton. Host SSH remains a DevOps repair path, not a general CI/CD control plane and not a service deployment path.

D601 Dev Foundation

Phase 2 of the D601 dev environment creates only the isolated namespace and database foundation. The authoritative manifest is src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-foundation.k8s.yaml.

It may create resources only in unidesk-dev:

Namespace unidesk-dev, plus quota and default limits.
Secret unidesk-dev-runtime-secrets as a dev-only template for DB credentials, provider token, auth/session secret, and Code Queue model secret placeholders.
ConfigMap unidesk-dev-runtime-config for dev identity, desired-state source origin/master:deploy.json#environments.dev, provider id D601-dev, Code Queue dev paths, and non-secret runtime defaults.
ConfigMap unidesk-dev-db-guard with an executable guard script that rejects production-looking DATABASE_URL values.
StatefulSet/Service postgres-dev with a 5Gi persistent volume claim and bounded CPU/memory requests/limits.
Job unidesk-dev-db-migrate, which waits for postgres-dev, runs the guard, then prepares backend-core and Code Queue tables in the independent unidesk_dev database.

The manifest must not create, update, or delete production namespace resources, production DB objects, production PVCs, production Deployments/Services/Secrets, or main server Docker Compose services. Static validation is available through bun scripts/cli.ts dev-env validate; Kubernetes client dry-run is bun scripts/cli.ts dev-env validate --kubectl-dry-run. If applying manually during Phase 2, the only allowed apply target is this manifest and the post-check must prove production resources are unchanged, for example by comparing kubectl -n unidesk get deploy,sts,svc,secret,pvc -o name before and after.

Before applying the foundation on a fresh D601 native k3s runtime, run bun scripts/cli.ts dev-env prewarm-images and wait for the returned job to succeed. This imports the foundation images postgres:16-alpine and rancher/mirrored-library-busybox:1.36.1 from Docker into /run/k3s/containerd/containerd.sock; k3s/containerd must not depend on live Docker Hub pulls during rollout. If this step is skipped, postgres-dev or the local-path helper pod can remain ImagePullBackOff, leaving the PVC pending even though the manifest is valid.

Phase 2 guardrails are deliberately limited to the dev manifest and CLI validator. Runtime startup guards for dev backend-core, Code Queue and Code Queue Manager must be reviewed and shipped as a separate change before dev workloads are exposed beyond dry-run or controlled apply.

On D601, dev/prod k3s verification must use the native k3s kubeconfig explicitly: KUBECONFIG=/etc/rancher/k3s/k3s.yaml. The default kubectl context may point at Docker Desktop and is not an acceptable target for UniDesk k3s deploy validation.

D601 Dev Core

Phase 3 introduces the dev backend/frontend manifest at src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml. It may create only backend-core-dev and frontend-dev Deployment/Service objects in unidesk-dev.

backend-core-dev must use unidesk-dev-runtime-config and unidesk-dev-runtime-secrets, connect to postgres-dev.../unidesk_dev, expose HTTP on 8080 and provider ingress on 8081, and write logs under /var/log/unidesk-dev. frontend-dev must set CORE_INTERNAL_URL=http://backend-core-dev.unidesk-dev.svc.cluster.local:8080 and must not proxy to production backend-core.

The manifest keeps placeholder image tags and deploy commit values in source control. Maintenance-channel direct D601 apply must not deploy backend-core-dev or frontend-dev; the CLI rejects deploy apply --env dev --service backend-core|frontend before runtime mutation. Dev core deployment must be implemented as a DevOps-controlled CD action that fetches origin/master:deploy.json, selects environments.dev, materializes the requested source commit on D601, narrows the dev core control manifest to the selected Service/Deployment pair, replaces placeholders with the requested commit and dev image tag, builds on D601, imports the image into native k3s containerd, applies only the unidesk-dev objects and stamps the Deployment. Client dry-run and static validation are the required checks before any controlled apply:

bun scripts/cli.ts dev-env validate --manifest src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml
KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply --dry-run=client --validate=false -f src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml

backend-core and frontend keep their production health payload shape by default. They add environment, namespace, databaseName, serviceId, deployRef and deploy commit metadata only when UNIDESK_ENV=dev or UNIDESK_NAMESPACE=unidesk-dev is set. The frontend shell shows a visible DEV ribbon only under the same dev identity.

D601 Dev Code Queue

Phase 5 introduces the dev Code Queue execution manifest at src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-code-queue.k8s.yaml. It may create only Code Queue dev execution objects in unidesk-dev: code-queue-scheduler-dev, code-queue-read-dev, code-queue-write-dev and the supporting d601-dev-provider-egress-proxy.

All dev Code Queue components must use unidesk-dev-runtime-config and unidesk-dev-runtime-secrets, connect to postgres-dev.../unidesk_dev, write logs and state under /home/ubuntu/unidesk-dev-code-queue-deploy/state, and expose HTTP on 4222 only as ClusterIP services. The scheduler uses CODE_QUEUE_MAIN_PROVIDER_ID=D601-dev, CODE_QUEUE_WORKDIR=/workspace-dev, CODE_QUEUE_REMOTE_WORKDIR=/home/ubuntu/unidesk-dev-workspace, disables ClaudeQQ notifications by default, and does not use the production d601-tcp-egress-gateway or production PostgreSQL route.

Maintenance-channel direct D601 apply must not deploy dev Code Queue; the CLI rejects deploy apply --env dev --service code-queue and the old codex deploy compatibility entry is disabled. Dev Code Queue deployment must be a DevOps-controlled CD action that fetches origin/master:deploy.json, selects environments.dev, materializes the requested source commit on D601, uses the dev Code Queue control manifest from that D601 materialized commit, narrows it to Code Queue dev objects, replaces placeholders with the requested commit and unidesk-code-queue:dev, builds on D601, imports the image into native k3s containerd, applies only unidesk-dev objects and stamps the dev Deployments. Because Code Queue carries the agent toolchain and browser/runtime dependencies, dev builds may reuse an existing D601 unidesk-code-queue:d601-build-base or unidesk-code-queue:d601 image when the dev build-base tag is absent, and the deploy executor allows a longer Code Queue build window than lightweight services. The scheduler has an explicit 5Gi memory limit and must use Recreate rollout strategy so an update does not temporarily require two scheduler replicas under the namespace quota. All dev Code Queue containers must set CPU limits so the namespace LimitRange does not inject a quota-breaking default CPU limit. Live health verification uses the Kubernetes API service proxy for the dev ClusterIP Service, not kubectl exec or debug binaries inside the application image. This first dev execution slice proves deployability, health and dev database isolation; wiring the dev frontend stable code-queue route through a dev code-queue-mgr is a separate later phase.

CLI

bun scripts/cli.ts deploy check [--file deploy.json] [--service <id>] checks the live runtime against the desired repo and commit without changing the system.

bun scripts/cli.ts deploy plan [--file deploy.json] [--service <id>] prints the same live state plus the intended action: noop, deploy or unsupported.

bun scripts/cli.ts deploy plan --env dev [--service <id>] reads origin/master:deploy.json#environments.dev and prints a dry-run environment plan without checking or mutating live runtime resources. deploy check --env dev uses the same dry-run environment plan. --env prod is available for parity as a dry-run planning path; it reads origin/master:deploy.json#environments.prod and must not use a dirty local deploy.json.

bun scripts/cli.ts deploy apply [--file deploy.json | --env dev] [--service <id>] [--dry-run] [--force] starts an asynchronous job only for supported targets. Use bun scripts/cli.ts job status <jobId> --tail-bytes 30000 to observe progress. --dry-run resolves the same plan but does not build or replace runtime objects. --force rebuilds even when the live commit matches. Environment apply currently supports only --env dev --service devops on the D601 maintenance direct path; --env prod apply is rejected, and D601 non-DevOps service apply is rejected before any runtime mutation.

All deploy commands output JSON. Long operations must use .state/jobs/ and bounded log tails; no deploy path may succeed with missing progress output.

Target-Side Build

Target-side build is the only standard deployment mode. The controller may run on the main server, but source materialization, compile/build, Docker image creation and deployment happen on the target node that will run the service.

Main server services are fetched, built and deployed on the main server.
D601 services are fetched, built and deployed on D601.
D518 services are fetched, built and deployed on D518.
k3s managed services are built on the active control target and then imported into that target's Kubernetes container runtime.

The reconciler distributes only repository URL, commit ID, Dockerfile path, build context and the existing deployment manifest/compose declaration. It must not distribute large Docker images between hosts as the default path, and it must not accept docker commit images, dirty worktrees or hand-mutated runtime containers as deployment truth.

Each target fetches the remote repository, resolves the requested commit to a full 40 character SHA and exports tracked files with git archive. Build contexts are created from that archive, not from the operator's current working tree. Environment applies such as deploy apply --env dev must not upload Kubernetes manifests or source files from the master server worktree; the target-side materialized commit is the source for Dockerfile, build context and k3s control manifests. The master server side may only do lightweight CLI orchestration, environment ref reading and remote command dispatch.

One-Shot Build Proxy

Target-side source fetches and Docker builds that need external network access use a one-shot proxy scope through provider-gateway WS egress. Provider targets connect only to their node-local provider-gateway egress endpoint, normally http://127.0.0.1:18789; provider-gateway carries the TCP stream over the already-authenticated provider WebSocket to the main server, and the main server opens the final outbound TCP connection. This is the only allowed proxy channel for provider-side deploy source fetches and builds. The deploy path must not mutate host-global proxy settings:

Do not edit /etc/docker/daemon.json.
Do not edit shell profiles or global Docker CLI config.
Do not leave long-lived host HTTP_PROXY, HTTPS_PROXY or ALL_PROXY.
Do not silently fall back to target local direct internet.
Do not create a separate SSH SOCKS proxy, public master proxy port, or direct backend-core/provider-ingress connection for deploy egress.

The standard implementation first probes GitHub through the node-local egress proxy, then runs target-side git clone/git fetch and the Docker build in that scoped environment. It also uses the target Docker daemon's local BuildKit builder so target-side base image and layer caches are reused. Proxy variables are scoped to the current deploy step and passed as matching --build-arg values for Dockerfile RUN steps; they are not written to daemon or shell configuration. Provider targets also use docker buildx build --network host so 127.0.0.1:<proxy-port> inside RUN resolves to the target host's loopback provider-gateway egress proxy. Each deploy must log the proxy channel and probe result, for example target_source_proxy=provider-gateway-ws-egress:http://127.0.0.1:18789, target_build_proxy=provider-gateway-ws-egress:http://127.0.0.1:18789 and target_build_proxy_probe=ok.

Build cache is part of the deployment contract, not an optimization left to Docker defaults. The deploy reconciler must pass inline BuildKit cache metadata (--cache-to type=inline) and import the current target image as cache source when it exists (--cache-from <image>). Dockerfiles that intentionally expose a warm build-base argument, such as Code Queue's CODE_QUEUE_BASE_IMAGE, may use the target-local <image>-build-base image to avoid re-running large apt/npm/Playwright setup layers; this is still target-local build cache and must be logged as target_build_base_image=<image>-build-base. If a service later needs an isolated docker-container builder or a local cache directory backend, it may use one only as a service-specific fallback and must still log proxy resolution, proxy probe result, cache source, cache destination and builder cleanup. The default path must not discard target-local image cache by creating a fresh builder for every deploy.

Main server targets may build without a proxy unless a service explicitly requires one. Provider targets must not bypass provider-gateway WS egress for GitHub, Debian apt, npm, Playwright, model downloads or any other external build dependency.

Deployment Executors

The reconciler selects the executor from config.json:

deployment.mode=unidesk-direct on main-server: build the image on the main server, then use the fixed UniDesk Compose project and up -d --no-build --no-deps --force-recreate <service>.
deployment.mode=internal-sidecar on main-server: use the same main-server target-side source export, Docker build, image label stamping, fixed Compose project replacement and live commit verification as direct Compose services. This class is for private sidecars such as code-queue-mgr; it is still versioned by deploy.json.commitId, not by the operator's current worktree.
deployment.mode=unidesk-direct on a provider: this executor is disabled for D601 service deployment except for the explicit DevOps bootstrap/repair path. The historical behavior dispatched host.ssh to the provider, built on the provider, then used the service's provider-local compose file and project; that shape must move behind DevOps for D601 services so the maintenance bridge cannot become a second deployment control plane.
Control bridges that UniDesk needs in order to inspect or repair an orchestrator must stay in this direct class. In particular, k3sctl-adapter is a UniDesk-managed bridge to native k3s and must remain outside k3s; Docker packaging on Docker Desktop/WSL must create an explicit host-local bridge, currently an adapter-container SSH local tunnel, to reach /etc/rancher/k3s/k3s.yaml and WSL 127.0.0.1:6443.
deployment.mode=k3sctl-managed: the target behavior is to build on the active control target, verify native k3s on the host OS/WSL distro, import the image into native k3s/containerd, apply the existing Kubernetes manifest, stamp the Deployment and wait for rollout. On D601, maintenance-channel direct execution of this behavior is reserved for DevOps itself; other k3s managed services must be reconciled by DevOps after bootstrap. The executor must use the native kubeconfig and containerd socket, for example /etc/rancher/k3s/k3s.yaml and /run/k3s/containerd/containerd.sock; running k3s itself in Docker is forbidden for both control-plane and worker nodes. A rancher/k3s image or legacy container may only be used as a temporary artifact source during migration, and any active containerized k3s control plane must be stopped before verification succeeds. The executor must preload a valid rancher/mirrored-pause:3.6 sandbox image into native k3s containerd through the provider-gateway one-shot egress path, verify its entrypoint is /pause, and reject fake or sleep-based replacement images. Code Queue's k3s migration executor must also stop/remove the legacy direct Docker code-queue-backend after k3s rollout, so there is never a second scheduler running beside the native k3s scheduler.

Existing service-specific commands such as Code Queue deploy are disabled as direct D601 deploy paths. Their build/import/rollout semantics should converge into DevOps-controlled CD instead of keeping a parallel implementation.

Decision Center is a standard k3sctl-managed service in this model, but D601 maintenance-channel direct apply must not deploy it. DevOps-controlled CD for Decision Center should build src/components/microservices/decision-center/Dockerfile on D601, import unidesk-decision-center:d601 into native k3s containerd, apply src/components/microservices/k3sctl-adapter/k3s/decision-center.k8s.yaml, stamp the Deployment, and verify health through /api/microservices/decision-center/health. It must not add a main-server Compose service, NodePort, hostPort, or provider-gateway direct HTTP backend for Decision Center.

CI Separation

Continuous integration is intentionally separate from this deploy reconciler. D601 k3s hosts Tekton CI resources described in docs/reference/ci.md, but those PipelineRuns only clone, check, run read-only performance gates, or create temporary CI-owned namespaces for dev manifest smoke e2e. They must not call deploy apply, codex deploy, kubectl rollout restart for production services, mutate deploy.json, or write production namespaces.

The Code Queue performance gate may create a temporary code-queue-ci-read service and read the main PostgreSQL through the existing d601-tcp-egress-gateway. Because it runs with CODE_QUEUE_SERVICE_ROLE=read, scheduler/backfill/notification disabled and EmptyDir state, it is not deployment truth and does not need a temporary database for the current read-only checks.

Version Stamping And Verification

Every successful deployment must stamp the source version in the runtime:

Docker image labels: unidesk.ai/service-id, unidesk.ai/source-repo, unidesk.ai/source-commit and unidesk.ai/dockerfile.
Runtime env or Kubernetes annotations: UNIDESK_DEPLOY_SERVICE_ID, UNIDESK_DEPLOY_REPO, UNIDESK_DEPLOY_COMMIT and UNIDESK_DEPLOY_REQUESTED_COMMIT.
Service health response should expose deploy.repo and deploy.commit when practical. Existing service-specific health contracts such as Code Queue's deploy.commit remain valid.

The deploy job is not complete until live verification proves the running service matches the requested commit. For Docker services this includes image label inspection on the running container. For k3s services this includes Deployment annotation/env inspection and service health through the Kubernetes API service proxy path for the target ClusterIP Service; production user-service requests continue through the same UniDesk microservice proxy path used by the frontend. A healthy old service must fail verification.

Unsupported Services

Image-only services, such as a service declared directly as docker.io/vendor/image:tag without a Dockerfile source artifact, do not satisfy target-side build policy. They must be converted to a source repository with a Dockerfile wrapper before the reconciler can manage them. Until then, deploy check and deploy plan should report them as unsupported.

24 KiB Raw Blame History