pikasTech-unidesk/docs/reference/dev-ci-runner.md

# Dev CI Runner

`ci run-dev-e2e` is the single manual entry for dev desired-state smoke verification. It is deliberately smaller than a DevOps control plane: the CLI starts one Git-controlled runner on D601, D601 creates a temporary CI namespace, Tekton runs the smoke check, and the result is written back as files that the CLI can inspect.

## Knowledge Ownership

This document is the authoritative source for the `ci run-dev-e2e` architecture, manifest contract, short launcher payload, host-vs-Tekton fetch boundary, result directory and no-CD safety rules. Release-line and dev-lane governance is owned by `docs/reference/release-governance.md` and [GitHub issue #6](https://github.com/pikasTech/unidesk/issues/6). `AGENTS.md`, `docs/reference/ci.md`, `docs/reference/cli.md`, `docs/reference/deploy.md` and `docs/reference/codex-deploy.md` may mention the command only as an index entry or one-line cross-reference. If a future change would require editing the same runner rule in multiple documents, update this file as the single source and replace other copies with links here.

## Goal

The runner exists to prove the dev desired state without interrupting production:

- Dev/prod isolation: temporary namespaces and dev manifests must not mutate `unidesk`, `unidesk-dev`, production PostgreSQL, production Deployments, production Services or main-server Compose services.
- Version determinism: all runner inputs come from the pushed `origin/master` commit that supplied `deploy.json` and `origin/master:deploy.json#environments.dev`.
- D601 execution: Git fetch, Tekton PipelineRun creation, Kubernetes polling and e2e log collection happen on D601, not on the main master.
- CLI observability: the submit command returns a `runId`, result directory and next commands; `ci logs <runId>` can recover status after the local CLI exits.
- CI only: the flow may create CI-owned temporary resources, but it must not deploy backend-core, frontend, Code Queue, Decision Center, k3sctl-adapter or any other direct/managed service.
- Code Queue reproducibility: the runner must use the `code-queue` commit from `environments.dev.services`, build or reuse a labeled image from that Git commit on D601, import it into native k3s containerd, and validate the HTTP API inside a temporary namespace.

Future stable and integration dev lanes such as `dev-v1` and `dev-master` must be explicit runner inputs before use. The current runner continues to validate `origin/master:deploy.json#environments.dev` and must not infer a release lane from local branch state or dirty manifests.

## Design Boundary

The first-principles requirement is to validate `origin/master:deploy.json#environments.dev` without interrupting production. Persistent DevOps services, run brokers, webhook listeners, generic remote command protocols and full CD are secondary abstractions and are out of scope for this phase unless this document is deliberately superseded. The only automatic execution path is:

```text
CLI -> backend-core /api/dispatch host.ssh -> D601 short launcher -> Git-controlled runner -> Tekton dev e2e PipelineRun
```

Host SSH is used only to start the bounded one-shot launcher and to read run files. It must not become a general deployment or repair path, and it must not fall back to arbitrary shell bodies when the runner, k3s, Tekton or egress proxy fails.

## Non-Goals

Do not add a long-lived DevOps service, run broker, webhook listener or second desired-state file for this phase. Do not turn Host SSH into a general deployment system. Future full-stack dev rollout or CD can reuse the same desired-state principles, but it must be designed as a separate controlled deployment path after this smoke runner is stable.

## Manifest Contract

`deploy.json` remains the only desired-state file. The dev environment may contain one non-service CI declaration:

```json
{
  "schemaVersion": 2,
  "environments": {
    "dev": {
      "ci": {
        "repo": "https://github.com/pikasTech/unidesk",
        "scriptPath": "scripts/ci/dev-e2e.sh",
        "timeoutMs": 1800000
      },
      "services": [
        {
          "id": "backend-core",
          "repo": "https://github.com/pikasTech/unidesk",
          "commitId": "<pushed-commit>"
        },
        {
          "id": "frontend",
          "repo": "https://github.com/pikasTech/unidesk",
          "commitId": "<pushed-commit>"
        },
        {
          "id": "code-queue",
          "repo": "https://github.com/pikasTech/unidesk",
          "commitId": "<pushed-commit>"
        }
      ]
    }
  }
}
```

`scriptPath` must be a repo-relative `scripts/ci/*.sh` path. Inline shell bodies, arbitrary script paths, local dirty scripts and separate `develop.json` or CI manifest files are forbidden. The script is fetched from the same full 40-character manifest commit that supplied `deploy.json`, so the runner logic is auditable and rollbackable with the desired state. Persistent dev rollout service scope is owned by `docs/reference/dev-environment.md`; this runner only consumes the dev service list for smoke verification and must not deploy it. `code-queue` is required in the dev service list for this smoke runner; its persistent dev deployment path is the separate dev-only registry artifact consumer, not this temporary e2e runner.

## Execution Path

The automatic path is intentionally single and narrow:

1. CLI fetches `origin/master` and reads `origin/master:deploy.json#environments.dev`.
2. CLI records the full manifest commit and generates a DNS-safe `runId`.
3. CLI sends a short launcher through backend-core `/api/dispatch` using the existing `host.ssh` provider capability for D601.
4. D601 creates `/tmp/unidesk-ci/<runId>` and `/home/ubuntu/.unidesk/runs/<runId>`.
5. D601 fetches the manifest commit from GitHub through the node-local provider-gateway WS egress proxy at `http://127.0.0.1:18789`.
6. D601 extracts the runner with `git show <commit>:<scriptPath> > /tmp/unidesk-ci/<runId>/runner.sh` and the desired-state blob with `git show <commit>:deploy.json > /tmp/unidesk-ci/<runId>/deploy.json`.
7. The runner parses the host-fetched `deploy.json`, requires a full-SHA `code-queue` service commit, builds or reuses a D601 Docker image for that commit with host networking so `127.0.0.1:18789` resolves to the node-local provider-gateway egress proxy, imports the image and `postgres:16-alpine` into native k3s containerd, creates the Tekton PipelineRun in `unidesk-ci`, passes the required dev service commits and Code Queue image tag as PipelineRun params, waits for completion when requested, and writes `result.json`, `launcher.log`, `runner.log`, PipelineRun JSON and pod logs under `/home/ubuntu/.unidesk/runs/<runId>/`.

The CLI must not upload the runner script body. Tekton dev e2e must not clone the private UniDesk repo itself; repo access and desired-state extraction happen once in the D601 host launcher under the manifest commit. The submitted launcher may contain only repo, full commit, script path, run id, environment, timeout, keep-namespace and fixed workspace path settings plus the fixed fetch/execute wrapper. If k3s, Tekton or the provider egress proxy is unavailable, the run fails with visible logs; it must not fall back to an alternate deployment path.

This runner validates application desired state. It is not the acceptance path for CI/CD infrastructure bootstrap, repair or upgrade; those infrastructure operations are manually tested and may be promoted directly to production when the infrastructure itself is the target.

## Runner Contract

The Git-controlled script must accept:

```bash
scripts/ci/dev-e2e.sh \
  --run-id <runId> \
  --repo-url <repo> \
  --desired-ref master \
  --manifest-commit <full-sha> \
  --manifest-file /tmp/unidesk-ci/<runId>/deploy.json \
  --environment dev \
  --result-dir /home/ubuntu/.unidesk/runs/<runId> \
  --timeout-ms <ms> \
  [--keep-namespace]
```

The current script creates a Tekton `PipelineRun` for `pipeline/unidesk-dev-namespace-e2e`, stores the generated PipelineRun name in `pipelinerun.txt`, and writes a final `result.json` with `ok`, `status`, `runId`, `manifestCommit`, `pipelineRun`, `temporaryNamespace` and `finishedAt`.

The Tekton task creates a temporary namespace `unidesk-ci-e2e-<runId>` and may create only CI-owned smoke resources there: `postgres-dev`, `code-queue-scheduler-dev`, `code-queue-read-dev`, `code-queue-write-dev`, their ClusterIP Services and a per-run Secret/ConfigMap. It must not mutate `unidesk` or persistent `unidesk-dev`. Code Queue API validation must use ClusterIP Services and the Kubernetes API `services/.../proxy` subresource; NodePort, D601 host ports and direct public service exposure are forbidden. The smoke currently proves `/health`, `/live` and `/api/workdirs` GET/POST/DELETE on read/write/scheduler roles, giving follow-up Code Queue API fixes a reproducible application target before production rollout. The stable frontend/backend proxy contract is checked by `microservice:code-queue-workdirs` in the normal UniDesk e2e harness.

## Commands

Start a run and return after dispatch:

```bash
bun scripts/cli.ts ci run-dev-e2e
```

Start a run and wait up to ten minutes for completion:

```bash
bun scripts/cli.ts ci run-dev-e2e --wait-ms 600000
```

Inspect run files on D601:

```bash
bun scripts/cli.ts ci logs <runId>
```

Regular Tekton CI remains documented in `docs/reference/ci.md`; deployment desired-state and target-side build rules remain documented in `docs/reference/deploy.md`.