refactor: use git-controlled dev ci runner

2026-05-18 08:38:17 +00:00
parent e2c8daede7
commit f86a75791b
22 changed files with 529 additions and 1402 deletions
@@ -20,6 +20,11 @@ The root `deploy.json` is the single desired-state source for both prod and dev.
      ]
    },
    "dev": {
+      "ci": {
+        "repo": "https://github.com/pikasTech/unidesk",
+        "scriptPath": "scripts/ci/dev-e2e.sh",
+        "timeoutMs": 1800000
+      },
      "services": [
        {
          "id": "backend-core",
@@ -34,35 +39,17 @@ The root `deploy.json` is the single desired-state source for both prod and dev.

 `schemaVersion=1` remains accepted only as a local compatibility format. Standard environment commands use `schemaVersion=2` and select `environments.dev.services` or `environments.prod.services`.

-`deploy.json` must not contain provider IDs, ports, compose service names, Kubernetes namespace, health paths, environment variables, Dockerfile paths or build commands. The deploy reconciler joins each `id` with `config.json.microservices[]` and existing k3s manifests to resolve those details. A service listed in `deploy.json` but missing from `config.json` is an error. A service with no Dockerfile source artifact is reported as unsupported rather than silently skipped. `commitId` may be a unique pushed short SHA or a full SHA; every deploy command resolves it through the remote repository to a full 40-character commit before target-side build or rollout, and fails immediately if the SHA is missing or ambiguous.
+`deploy.json` service entries must not contain provider IDs, ports, compose service names, Kubernetes namespace, health paths, environment variables, Dockerfile paths or build commands. The deploy reconciler joins each service `id` with `config.json.microservices[]` and existing k3s manifests to resolve those details. A service listed in `deploy.json` but missing from `config.json` is an error. A service with no Dockerfile source artifact is reported as unsupported rather than silently skipped. `commitId` may be a unique pushed short SHA or a full SHA; every deploy command resolves it through the remote repository to a full 40-character commit before target-side build or rollout, and fails immediately if the SHA is missing or ambiguous.

-Environment mode never reads the local dirty working tree manifest. `deploy check --env ...`, `deploy plan --env ...` and `deploy apply --env ...` fetch `origin/master`, read `origin/master:deploy.json`, select `environments.<env>`, and report the manifest commit/blob, service commit IDs, target namespace, database fingerprint and Provider identity. Maintenance-channel direct D601 apply is intentionally narrow: only `deploy apply --env dev --service devops` may use that path, and only for DevOps bootstrap, repair or break-glass recovery. `deploy apply --env dev --service backend-core|frontend|code-queue` and local-manifest D601 service apply are rejected before runtime mutation; those services must be deployed by the DevOps control plane after it is healthy. `deploy apply --env prod` remains disabled until the production environment executor and authorization policy are explicitly added.
+The only non-service execution declaration currently allowed under `environments.dev` is `ci`. It selects the Git-controlled one-shot dev CI runner; the authoritative `repo`, `scriptPath`, `timeoutMs`, short launcher and no-CD boundary are defined in `docs/reference/dev-ci-runner.md`.
+
+Environment mode never reads the local dirty working tree manifest. `deploy check --env ...`, `deploy plan --env ...` and `deploy apply --env ...` fetch `origin/master`, read `origin/master:deploy.json`, select `environments.<env>`, and report the manifest commit/blob, service commit IDs, target namespace, database fingerprint and Provider identity. Maintenance-channel direct D601 apply is intentionally narrow: no D601 backend-core/frontend/code-queue/Decision Center/k3sctl-adapter service deployment may use that path. `deploy apply --env prod` remains disabled until the production environment executor and authorization policy are explicitly added.

 `config.json.microservices[].repository.commitId` is retained for catalog compatibility, but `deploy.json` is the deployment version authority for the reconciler.

-## DevOps Bootstrap
+## Dev CI Runner

-DevOps has an intentional first-install bootstrap path to avoid a circular dependency where the service that should deploy CI/CD must already exist before it can deploy itself.
-
-The only supported first-install shape is a one-shot D601-side script:
-
-```bash
-tmp=$(mktemp) && curl -fsSL https://raw.githubusercontent.com/pikasTech/unidesk/master/scripts/bootstrap/devops-install.sh -o "$tmp" && sudo bash "$tmp" --commit <unidesk-commit-id> --env dev
-```
-
-The bootstrapper may use D601 local shell, native SSH or provider-gateway Host SSH as a maintenance bridge, but only for DevOps bootstrap, repair and break-glass recovery. This maintenance bridge must not deploy backend-core, frontend, Code Queue, Decision Center, k3sctl-adapter or any other direct/managed microservice. It must run source fetch, Go build, Docker build, k3s image import and Kubernetes apply on D601. The main server must not compile Go/Rust or build DevOps images for D601.
-
-The bootstrapper is deliberately narrow and idempotent:
-
- Verify D601 native k3s and `/etc/rancher/k3s/k3s.yaml`.
- Clone or fetch the UniDesk repo on D601 and checkout the requested commit.
- Build `src/components/microservices/devops/Dockerfile` on D601.
- Import `unidesk-devops:dev` into native k3s containerd.
- Apply `src/components/microservices/k3sctl-adapter/k3s/devops.k8s.yaml` into `unidesk-ci`.
- Wait for `deployment/devops` rollout and `/health`.
- Write a local bootstrap receipt with repo, requested commit, resolved commit, namespace, image and health result.
-
-After DevOps is healthy, normal CI/CD control should move to `CLI -> backend-core -> k3sctl-adapter -> DevOps -> Kubernetes API/Tekton`. Host SSH remains a DevOps repair path, not a general CI/CD control plane and not a service deployment path.
+Dev desired-state smoke verification is not a deploy executor. Use `bun scripts/cli.ts ci run-dev-e2e` for the Git-controlled temporary namespace runner described in `docs/reference/dev-ci-runner.md`. `deploy apply --env dev` must not roll out D601 direct or k3s-managed services in the current CI-only phase.

 ## D601 Dev Foundation

@@ -91,7 +78,7 @@ Phase 3 introduces the dev backend/frontend manifest at `src/components/microser

 `backend-core-dev` must use `unidesk-dev-runtime-config` and `unidesk-dev-runtime-secrets`, connect to `postgres-dev.../unidesk_dev`, expose HTTP on 8080 and provider ingress on 8081, and write logs under `/var/log/unidesk-dev`. `frontend-dev` must set `CORE_INTERNAL_URL=http://backend-core-dev.unidesk-dev.svc.cluster.local:8080` and must not proxy to production backend-core.

-The manifest keeps placeholder image tags and deploy commit values in source control. Maintenance-channel direct D601 apply must not deploy `backend-core-dev` or `frontend-dev`; the CLI rejects `deploy apply --env dev --service backend-core|frontend` before runtime mutation. Dev core deployment must be implemented as a DevOps-controlled CD action that fetches `origin/master:deploy.json`, selects `environments.dev`, materializes the requested source commit on D601, narrows the dev core control manifest to the selected Service/Deployment pair, replaces placeholders with the requested commit and dev image tag, builds on D601, imports the image into native k3s containerd, applies only the `unidesk-dev` objects and stamps the Deployment. Client dry-run and static validation are the required checks before any controlled apply:
+The manifest keeps placeholder image tags and deploy commit values in source control. Maintenance-channel direct D601 apply must not deploy `backend-core-dev` or `frontend-dev`; the CLI rejects `deploy apply --env dev --service backend-core|frontend` before runtime mutation. Dev core deployment must be implemented later as a controlled D601 target-side action that fetches `origin/master:deploy.json`, selects `environments.dev`, materializes the requested source commit on D601, narrows the dev core control manifest to the selected Service/Deployment pair, replaces placeholders with the requested commit and dev image tag, builds on D601, imports the image into native k3s containerd, applies only the `unidesk-dev` objects and stamps the Deployment. Client dry-run and static validation are the required checks before any controlled apply:

 - `bun scripts/cli.ts dev-env validate --manifest src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`
 - `KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl apply --dry-run=client --validate=false -f src/components/microservices/k3sctl-adapter/k3s/dev/unidesk-dev-core.k8s.yaml`
@@ -104,7 +91,7 @@ Phase 5 introduces the dev Code Queue execution manifest at `src/components/micr

 All dev Code Queue components must use `unidesk-dev-runtime-config` and `unidesk-dev-runtime-secrets`, connect to `postgres-dev.../unidesk_dev`, write logs and state under `/home/ubuntu/unidesk-dev-code-queue-deploy/state`, and expose HTTP on 4222 only as ClusterIP services. The scheduler uses `CODE_QUEUE_MAIN_PROVIDER_ID=D601-dev`, `CODE_QUEUE_WORKDIR=/workspace-dev`, `CODE_QUEUE_REMOTE_WORKDIR=/home/ubuntu/unidesk-dev-workspace`, disables ClaudeQQ notifications by default, and does not use the production `d601-tcp-egress-gateway` or production PostgreSQL route.

-Maintenance-channel direct D601 apply must not deploy dev Code Queue; the CLI rejects `deploy apply --env dev --service code-queue` and the old `codex deploy` compatibility entry is disabled. Dev Code Queue deployment must be a DevOps-controlled CD action that fetches `origin/master:deploy.json`, selects `environments.dev`, materializes the requested source commit on D601, uses the dev Code Queue control manifest from that D601 materialized commit, narrows it to Code Queue dev objects, replaces placeholders with the requested commit and `unidesk-code-queue:dev`, builds on D601, imports the image into native k3s containerd, applies only `unidesk-dev` objects and stamps the dev Deployments. Because Code Queue carries the agent toolchain and browser/runtime dependencies, dev builds may reuse an existing D601 `unidesk-code-queue:d601-build-base` or `unidesk-code-queue:d601` image when the dev build-base tag is absent, and the deploy executor allows a longer Code Queue build window than lightweight services. The scheduler has an explicit 5Gi memory limit and must use `Recreate` rollout strategy so an update does not temporarily require two scheduler replicas under the namespace quota. All dev Code Queue containers must set CPU limits so the namespace `LimitRange` does not inject a quota-breaking default CPU limit. Live health verification uses the Kubernetes API service proxy for the dev ClusterIP Service, not `kubectl exec` or debug binaries inside the application image. This first dev execution slice proves deployability, health and dev database isolation; wiring the dev frontend stable `code-queue` route through a dev `code-queue-mgr` is a separate later phase.
+Maintenance-channel direct D601 apply must not deploy dev Code Queue; the CLI rejects `deploy apply --env dev --service code-queue` and the old `codex deploy` compatibility entry is disabled. Dev Code Queue deployment must be implemented later as a controlled D601 target-side action that fetches `origin/master:deploy.json`, selects `environments.dev`, materializes the requested source commit on D601, uses the dev Code Queue control manifest from that D601 materialized commit, narrows it to Code Queue dev objects, replaces placeholders with the requested commit and `unidesk-code-queue:dev`, builds on D601, imports the image into native k3s containerd, applies only `unidesk-dev` objects and stamps the dev Deployments. Because Code Queue carries the agent toolchain and browser/runtime dependencies, dev builds may reuse an existing D601 `unidesk-code-queue:d601-build-base` or `unidesk-code-queue:d601` image when the dev build-base tag is absent, and the deploy executor allows a longer Code Queue build window than lightweight services. The scheduler has an explicit 5Gi memory limit and must use `Recreate` rollout strategy so an update does not temporarily require two scheduler replicas under the namespace quota. All dev Code Queue containers must set CPU limits so the namespace `LimitRange` does not inject a quota-breaking default CPU limit. Live health verification uses the Kubernetes API service proxy for the dev ClusterIP Service, not `kubectl exec` or debug binaries inside the application image. This first dev execution slice proves deployability, health and dev database isolation; wiring the dev frontend stable `code-queue` route through a dev `code-queue-mgr` is a separate later phase.

 ## CLI

@@ -114,7 +101,7 @@ Maintenance-channel direct D601 apply must not deploy dev Code Queue; the CLI re

 `bun scripts/cli.ts deploy plan --env dev [--service <id>]` reads `origin/master:deploy.json#environments.dev` and prints a dry-run environment plan without checking or mutating live runtime resources. `deploy check --env dev` uses the same dry-run environment plan. `--env prod` is available for parity as a dry-run planning path; it reads `origin/master:deploy.json#environments.prod` and must not use a dirty local `deploy.json`.

-`bun scripts/cli.ts deploy apply [--file deploy.json | --env dev] [--service <id>] [--dry-run] [--force]` starts an asynchronous job only for supported targets. Use `bun scripts/cli.ts job status <jobId> --tail-bytes 30000` to observe progress. `--dry-run` resolves the same plan but does not build or replace runtime objects. `--force` rebuilds even when the live commit matches. Environment apply currently supports only `--env dev --service devops` on the D601 maintenance direct path; `--env prod` apply is rejected, and D601 non-DevOps service apply is rejected before any runtime mutation.
+`bun scripts/cli.ts deploy apply [--file deploy.json | --env dev] [--service <id>] [--dry-run] [--force]` starts an asynchronous job only for supported targets. Use `bun scripts/cli.ts job status <jobId> --tail-bytes 30000` to observe progress. `--dry-run` resolves the same plan but does not build or replace runtime objects. `--force` rebuilds even when the live commit matches. Environment apply is not the dev e2e trigger; use `bun scripts/cli.ts ci run-dev-e2e` for the Git-controlled temporary namespace smoke flow. `--env prod` apply is rejected, and D601 service apply is rejected before any runtime mutation.

 All deploy commands output JSON. Long operations must use `.state/jobs/` and bounded log tails; no deploy path may succeed with missing progress output.

@@ -153,13 +140,13 @@ The reconciler selects the executor from `config.json`:

 - `deployment.mode=unidesk-direct` on `main-server`: build the image on the main server, then use the fixed UniDesk Compose project and `up -d --no-build --no-deps --force-recreate <service>`.
 - `deployment.mode=internal-sidecar` on `main-server`: use the same main-server target-side source export, Docker build, image label stamping, fixed Compose project replacement and live commit verification as direct Compose services. This class is for private sidecars such as `code-queue-mgr`; it is still versioned by `deploy.json.commitId`, not by the operator's current worktree.
- `deployment.mode=unidesk-direct` on a provider: this executor is disabled for D601 service deployment except for the explicit DevOps bootstrap/repair path. The historical behavior dispatched `host.ssh` to the provider, built on the provider, then used the service's provider-local compose file and project; that shape must move behind DevOps for D601 services so the maintenance bridge cannot become a second deployment control plane.
+- `deployment.mode=unidesk-direct` on a provider: this executor is disabled for D601 service deployment. The historical behavior dispatched `host.ssh` to the provider, built on the provider, then used the service's provider-local compose file and project; that shape must not remain a second deployment control plane.
 - Control bridges that UniDesk needs in order to inspect or repair an orchestrator must stay in this direct class. In particular, `k3sctl-adapter` is a UniDesk-managed bridge to native k3s and must remain outside k3s; Docker packaging on Docker Desktop/WSL must create an explicit host-local bridge, currently an adapter-container SSH local tunnel, to reach `/etc/rancher/k3s/k3s.yaml` and WSL `127.0.0.1:6443`.
- `deployment.mode=k3sctl-managed`: the target behavior is to build on the active control target, verify native k3s on the host OS/WSL distro, import the image into native k3s/containerd, apply the existing Kubernetes manifest, stamp the Deployment and wait for rollout. On D601, maintenance-channel direct execution of this behavior is reserved for DevOps itself; other k3s managed services must be reconciled by DevOps after bootstrap. The executor must use the native kubeconfig and containerd socket, for example `/etc/rancher/k3s/k3s.yaml` and `/run/k3s/containerd/containerd.sock`; running k3s itself in Docker is forbidden for both control-plane and worker nodes. A `rancher/k3s` image or legacy container may only be used as a temporary artifact source during migration, and any active containerized k3s control plane must be stopped before verification succeeds. The executor must preload a valid `rancher/mirrored-pause:3.6` sandbox image into native k3s containerd through the provider-gateway one-shot egress path, verify its entrypoint is `/pause`, and reject fake or sleep-based replacement images. Code Queue's k3s migration executor must also stop/remove the legacy direct Docker `code-queue-backend` after k3s rollout, so there is never a second scheduler running beside the native k3s scheduler.
+- `deployment.mode=k3sctl-managed`: the target behavior is to build on the active control target, verify native k3s on the host OS/WSL distro, import the image into native k3s/containerd, apply the existing Kubernetes manifest, stamp the Deployment and wait for rollout. On D601, maintenance-channel direct execution of this behavior is not allowed for normal services; the current work is CI-only and must not roll out services. The executor must use the native kubeconfig and containerd socket, for example `/etc/rancher/k3s/k3s.yaml` and `/run/k3s/containerd/containerd.sock`; running k3s itself in Docker is forbidden for both control-plane and worker nodes. A `rancher/k3s` image or legacy container may only be used as a temporary artifact source during migration, and any active containerized k3s control plane must be stopped before verification succeeds. The executor must preload a valid `rancher/mirrored-pause:3.6` sandbox image into native k3s containerd through the provider-gateway one-shot egress path, verify its entrypoint is `/pause`, and reject fake or sleep-based replacement images. Code Queue's k3s migration executor must also stop/remove the legacy direct Docker `code-queue-backend` after k3s rollout, so there is never a second scheduler running beside the native k3s scheduler.

-Existing service-specific commands such as Code Queue deploy are disabled as direct D601 deploy paths. Their build/import/rollout semantics should converge into DevOps-controlled CD instead of keeping a parallel implementation.
+Existing service-specific commands such as Code Queue deploy are disabled as direct D601 deploy paths. Their build/import/rollout semantics should converge later into one controlled target-side deployment path instead of keeping parallel implementations.

-Decision Center is a standard `k3sctl-managed` service in this model, but D601 maintenance-channel direct apply must not deploy it. DevOps-controlled CD for Decision Center should build `src/components/microservices/decision-center/Dockerfile` on D601, import `unidesk-decision-center:d601` into native k3s containerd, apply `src/components/microservices/k3sctl-adapter/k3s/decision-center.k8s.yaml`, stamp the Deployment, and verify health through `/api/microservices/decision-center/health`. It must not add a main-server Compose service, NodePort, hostPort, or provider-gateway direct HTTP backend for Decision Center.
+Decision Center is a standard `k3sctl-managed` service in this model, but D601 maintenance-channel direct apply must not deploy it. Future controlled CD for Decision Center should build `src/components/microservices/decision-center/Dockerfile` on D601, import `unidesk-decision-center:d601` into native k3s containerd, apply `src/components/microservices/k3sctl-adapter/k3s/decision-center.k8s.yaml`, stamp the Deployment, and verify health through `/api/microservices/decision-center/health`. It must not add a main-server Compose service, NodePort, hostPort, or provider-gateway direct HTTP backend for Decision Center.

 ## CI Separation