docs: plan d601 k3s dev environment

This commit is contained in:
Codex
2026-05-17 12:14:05 +00:00
parent 4da70ca671
commit 5093bec450
+345
View File
@@ -0,0 +1,345 @@
# D601 k3s Development Environment Plan
## Goal
Build an isolated UniDesk development environment inside the existing D601 native k3s cluster so LLM-driven development can deploy, break, rebuild, and validate backend-core, frontend, Code Queue, and their database dependencies without interrupting the production main server.
The first version must support deployment by GitHub commit id through environment deploy manifests. The desired long-term control point is GitHub-hosted `deploy.json`: deploying an environment reads the `deploy.json` stored on the matching GitHub environment branch and applies the commit ids declared there.
Initial environment branches:
- `deploy/dev`: desired state for the D601 k3s development environment.
- `deploy/prod`: desired state for production. Branch protection can be added later; the first implementation must still keep prod deployment commands and credentials separate from dev.
## Non-Goals
- Do not create a second physical k3s control plane in the first version. Use the existing D601 native k3s cluster with namespace-level isolation.
- Do not move production main server backend-core/frontend into k3s in the first version.
- Do not let the dev environment share production PostgreSQL tables, provider identity, provider token, Code Queue task state, or deployment worktree paths.
- Do not make `deploy/dev` or `deploy/prod` aliases for normal source branches. They are environment desired-state branches.
## Target Dev Topology
The first dev environment runs in namespace `unidesk-dev` on D601:
- `postgres-dev`: independent PostgreSQL StatefulSet or equivalent persistent database for dev.
- `backend-core-dev`: backend-core built from the commit id declared in `deploy/dev:deploy.json`.
- `frontend-dev`: frontend built from the commit id declared in `deploy/dev:deploy.json`, proxying only to `backend-core-dev`.
- `code-queue-mgr-dev`: lightweight Code Queue control plane using the dev database.
- `code-queue-read-dev`, `code-queue-write-dev`, `code-queue-scheduler-dev`: Code Queue k3s execution components using dev database, dev logs, dev state paths, and dev Code Queue settings.
- Optional first-access path: SSH port-forward or a private D601-hosted ingress. Public exposure is not required for phase 1.
All dev services must report environment identity in `/health`:
- `environment=dev`
- namespace
- database name
- service id
- GitHub repo and commit id
- deployment ref, expected to be `origin/deploy/dev`
## Core Isolation Rules
1. Dev services must use `unidesk-dev` namespace only.
2. Dev services must use a dev PostgreSQL instance or database. They must not connect to production PostgreSQL.
3. Dev provider identity must be separate, for example `D601-dev`; it must not reuse production `D601` provider id or provider token.
4. Dev Code Queue tasks, queues, attempts, notifications, and trace state must not write production tables unless table names are explicitly namespaced and verified safe. The preferred first version is a separate dev database.
5. Dev manifests must not mount production deployment roots such as `/root/unidesk` on the main server or production D601 deployment paths unless the mount is read-only and explicitly needed for diagnostics.
6. Dev Code Queue must use dev work directories, dev log directories, and dev state directories.
7. Production deploy must not read a local dirty `deploy.json`; production deploy must read the production desired state from the configured GitHub environment ref.
8. LLM/Code Queue development tasks should only receive dev deploy credentials by default.
## Deploy Manifest Model
Use one schema for environment manifests:
```json
{
"schemaVersion": 1,
"environment": "dev",
"services": [
{
"id": "backend-core",
"repo": "https://github.com/pikasTech/unidesk",
"commitId": "<commit>"
},
{
"id": "frontend",
"repo": "https://github.com/pikasTech/unidesk",
"commitId": "<commit>"
},
{
"id": "code-queue",
"repo": "https://github.com/pikasTech/unidesk",
"commitId": "<commit>"
},
{
"id": "code-queue-mgr",
"repo": "https://github.com/pikasTech/unidesk",
"commitId": "<commit>"
}
]
}
```
Environment-to-ref mapping must be fixed in code or canonical config:
- `dev` maps to `origin/deploy/dev`.
- `prod` maps to `origin/deploy/prod`.
The deploy command should accept an environment, not an arbitrary branch for production. A debug or admin-only command may inspect arbitrary refs, but normal prod deployment must use the fixed mapping.
## Phase 0: Design And Guardrails
Purpose: make the target behavior explicit before adding a second runtime.
Implementation items:
- Define the environment manifest schema and validation rules.
- Add `environment` to deploy manifests and reject mismatches.
- Define fixed environment mappings: `dev -> deploy/dev`, `prod -> deploy/prod`.
- Document target namespace, database, provider identity, and service ids for dev.
- Add CLI dry-run planning output that prints:
- selected environment
- GitHub ref
- resolved manifest commit
- services and commit ids
- target namespace
- target database fingerprint
- target provider identity
Acceptance criteria:
- `deploy plan --env dev` can read and validate a dev manifest without mutating the cluster.
- `deploy plan --env prod` can read and validate a prod manifest without using the local worktree `deploy.json`.
- A manifest with `environment=prod` must be rejected for `--env dev`, and the reverse must also be rejected.
## Phase 1: GitHub Environment Branch Deploy Source
Purpose: make GitHub desired-state refs the deploy source of truth.
Implementation items:
- Create or initialize `deploy/dev` with a valid `deploy.json`.
- Create or initialize `deploy/prod` with a valid `deploy.json`.
- Add CLI support to fetch an environment ref and read `deploy.json` from that ref.
- Keep the existing local `deploy.json` path as a compatibility mode only for explicit local/admin workflows.
- Ensure commit ids listed by the manifest exist in their declared repos.
- Ensure dev/prod deploy does not depend on a dirty local working tree.
Acceptance criteria:
- `deploy plan --env dev` reads `origin/deploy/dev:deploy.json`.
- `deploy plan --env prod` reads `origin/deploy/prod:deploy.json`.
- Changing local `deploy.json` does not affect `--env dev` or `--env prod`.
- The plan output includes the Git ref and manifest blob/commit used.
## Phase 2: D601 Dev Namespace And Database
Purpose: create the minimum isolated substrate for dev backend and Code Queue state.
Implementation items:
- Add a k8s manifest for namespace `unidesk-dev`.
- Add dev PostgreSQL StatefulSet/Service/PVC or an equivalent persistent DB.
- Add dev DB init and migration flow for backend-core and Code Queue tables.
- Add dev secrets/config:
- database credentials
- provider token
- auth/session secret
- Code Queue model secrets if needed
- Add resource requests/limits so dev DB cannot starve D601 production k3s workloads.
Technical decisions:
- Prefer a separate dev PostgreSQL instance over sharing production PostgreSQL with a different database name. It gives the clearest failure boundary.
- If a shared PostgreSQL server is temporarily used, the CLI and services must hard-check database name and connection target before startup.
Acceptance criteria:
- `kubectl -n unidesk-dev get pods,svc,pvc` shows the dev DB ready.
- Dev DB survives Pod restart.
- Dev services cannot accidentally connect to the production database URL without failing startup validation.
## Phase 3: backend-core-dev And frontend-dev
Purpose: make a usable UniDesk dev control surface independent from production main server Compose.
Implementation items:
- Add k8s manifests for `backend-core-dev` and `frontend-dev`.
- Build images from the commit ids declared in `deploy/dev:deploy.json`.
- Inject dev-only config into backend-core:
- `UNIDESK_ENV=dev`
- dev `MICROSERVICES_JSON`
- dev database URL
- dev provider token
- dev log paths
- Inject frontend config so it proxies to `backend-core-dev`, not production backend-core.
- Add service health and readiness probes.
- Expose dev frontend through port-forward or a private dev ingress.
Technical decisions:
- First version can omit public exposure. Port-forward is acceptable while validating isolation.
- Dev frontend must have a visible DEV environment marker to avoid operator confusion.
Acceptance criteria:
- Dev backend-core `/health` returns ok and includes `environment=dev`.
- Dev frontend `/health` returns ok and proxies only to dev backend-core.
- Production `bun scripts/cli.ts server status` remains healthy while dev backend/frontend are redeployed.
- Rebuilding dev backend/frontend does not touch main server Docker Compose containers.
## Phase 4: code-queue-mgr-dev
Purpose: provide the dev queue management and submission path without writing production Code Queue tables.
Implementation items:
- Add k8s manifest for `code-queue-mgr-dev`.
- Configure it to use the dev database only.
- Configure dev backend-core service catalog so stable dev `code-queue` control/read paths route to `code-queue-mgr-dev`.
- Ensure `code-queue-mgr-dev` can submit, list, summarize, and update dev queue state.
- Add health output proving:
- role is master-control-plane or dev-control-plane
- database is dev
- schema is ready
- no runner dependencies
Acceptance criteria:
- Dev UI/CLI can submit a dry-run or queued task to the dev DB.
- Production Code Queue task list is unchanged by dev submissions.
- Dev `code-queue-mgr-dev` memory footprint remains within the lightweight control-plane budget.
## Phase 5: code-queue-dev Execution Components
Purpose: run dev Code Queue execution inside `unidesk-dev` without interfering with production Code Queue.
Implementation items:
- Add dev variants of Code Queue manifests:
- `code-queue-read-dev`
- `code-queue-write-dev`
- `code-queue-scheduler-dev`
- Configure all dev components to use dev database, dev logs, and dev state paths.
- Use dev service names and labels so production k3s adapter does not confuse dev and prod services.
- Decide whether first version supports real Codex execution or smoke-only execution.
- If real execution is enabled:
- isolate workdir paths
- isolate Codex/OpenCode XDG/state paths
- isolate notifications
- cap concurrency and memory
- avoid writing production OA Event Flow unless explicitly configured for dev
Technical decisions:
- First version should default to smoke/dry-run execution unless real task execution is needed immediately.
- If real task execution is enabled, use a dev-specific queue prefix or dev database and disable production ClaudeQQ notifications by default.
Acceptance criteria:
- Dev Code Queue `/health` returns ok and includes `environment=dev`.
- Dev scheduler can pick up a dev queued task and move it through a terminal state.
- Restarting dev scheduler does not affect production running tasks.
- Production `code-queue` health remains healthy during dev Code Queue rollout.
## Phase 6: Dev Deploy Apply
Purpose: make `deploy/dev:deploy.json` drive the dev environment end to end.
Implementation items:
- Add `deploy apply --env dev`.
- For each service in the dev manifest:
- fetch declared repo and commit
- build image on D601 or through the established target-side build path
- tag image with environment and commit
- apply the dev k8s manifest
- wait for rollout
- verify live commit from `/health` or Deployment annotation
- Ensure deployment records include environment, ref, service id, commit id, image tag, namespace, and rollout status.
- Add `deploy status --env dev` or equivalent drift check.
Acceptance criteria:
- Updating `deploy/dev:deploy.json` to a new commit and running `deploy apply --env dev` updates dev backend-core/frontend/code-queue components.
- Live `/health` commit matches the manifest commit.
- No production Deployment, Service, Secret, PVC, DB table, or Docker Compose container is mutated by dev deploy.
## Phase 7: Prod Deploy Ref Compatibility
Purpose: let production read desired state from `deploy/prod` while keeping production runtime unchanged.
Implementation items:
- Add `deploy plan --env prod` and `deploy apply --env prod` using `origin/deploy/prod:deploy.json`.
- Keep production target executors as they are initially:
- main server Compose for production backend-core/frontend and direct sidecars
- D601 k3s for production Code Queue execution
- Enforce production command guardrails:
- canonical root only
- production credentials only on main server
- manifest must say `environment=prod`
- target namespace and provider identity must match production
- Branch protection for `deploy/prod` is recommended but can be added after the first version.
Acceptance criteria:
- Production deploy no longer depends on local `deploy.json`.
- Production deploy reports the exact Git ref and manifest commit used.
- Production deploy still validates live commit after rollout.
## Phase 8: Operator And LLM Safety
Purpose: reduce environment confusion for LLM agents and humans.
Implementation items:
- Add clear CLI output for every deploy:
- environment
- ref
- namespace
- DB fingerprint
- provider id
- services and commits
- Add explicit DEV marker in dev frontend.
- Add hard startup checks:
- dev service refuses production DB
- dev service refuses production provider id/token
- prod service refuses dev namespace/DB
- Ensure LLM task containers receive dev deploy credentials by default and do not receive prod credentials.
- Add smoke checks that intentionally try unsafe combinations and verify they fail.
Acceptance criteria:
- Running a dev service with production DB config fails before listening.
- Running prod deploy from a non-canonical context fails.
- LLM/Code Queue default environment can deploy dev but cannot deploy prod without the separate production credential path.
## Risks And Mitigations
- Risk: namespace isolation does not isolate node-level CPU, memory, Docker socket, hostPath, or containerd load.
- Mitigation: resource requests/limits, separate dev workdirs, no production path mounts, and bounded Code Queue concurrency.
- Risk: dev Code Queue accidentally writes production task tables.
- Mitigation: separate dev DB, startup DB fingerprint checks, and health output showing DB identity.
- Risk: dev frontend appears to be prod or proxies to prod backend-core.
- Mitigation: visible DEV marker, `CORE_INTERNAL_URL` hardwired to dev service, and proxy target health checks.
- Risk: deploy command accidentally reads local manifest instead of GitHub environment ref.
- Mitigation: `--env` mode must read remote ref only and report the ref/blob used.
- Risk: D601 k3s control plane failure affects both dev and production k3s workloads.
- Mitigation: accept this in phase 1; consider a separate physical/node-level dev cluster only after namespace isolation proves insufficient.
- Risk: branch `deploy/prod` is initially unprotected.
- Mitigation: even before branch protection, production deploy should still require canonical main server credentials and should report the ref used for audit.
## Suggested Implementation Order
1. Phase 0 and Phase 1: establish GitHub environment branch desired-state and dry-run planning.
2. Phase 2 and Phase 3: create dev namespace, dev DB, backend-core-dev, and frontend-dev.
3. Phase 4 and Phase 5: add dev Code Queue control and execution components.
4. Phase 6: make `deploy apply --env dev` deploy the full first dev stack by commit id.
5. Phase 7: migrate production deploy to `deploy/prod`.
6. Phase 8: harden operator and LLM safety checks.
The first milestone is complete when `deploy apply --env dev` can deploy backend-core, frontend, code-queue-mgr, and Code Queue read/write/scheduler into `unidesk-dev` from commit ids declared in `origin/deploy/dev:deploy.json`, and repeated dev redeploys do not change production main server status or production Code Queue state.