docs: plan d601 k3s dev environment
This commit is contained in:
@@ -0,0 +1,345 @@
|
||||
# D601 k3s Development Environment Plan
|
||||
|
||||
## Goal
|
||||
|
||||
Build an isolated UniDesk development environment inside the existing D601 native k3s cluster so LLM-driven development can deploy, break, rebuild, and validate backend-core, frontend, Code Queue, and their database dependencies without interrupting the production main server.
|
||||
|
||||
The first version must support deployment by GitHub commit id through environment deploy manifests. The desired long-term control point is GitHub-hosted `deploy.json`: deploying an environment reads the `deploy.json` stored on the matching GitHub environment branch and applies the commit ids declared there.
|
||||
|
||||
Initial environment branches:
|
||||
|
||||
- `deploy/dev`: desired state for the D601 k3s development environment.
|
||||
- `deploy/prod`: desired state for production. Branch protection can be added later; the first implementation must still keep prod deployment commands and credentials separate from dev.
|
||||
|
||||
## Non-Goals
|
||||
|
||||
- Do not create a second physical k3s control plane in the first version. Use the existing D601 native k3s cluster with namespace-level isolation.
|
||||
- Do not move production main server backend-core/frontend into k3s in the first version.
|
||||
- Do not let the dev environment share production PostgreSQL tables, provider identity, provider token, Code Queue task state, or deployment worktree paths.
|
||||
- Do not make `deploy/dev` or `deploy/prod` aliases for normal source branches. They are environment desired-state branches.
|
||||
|
||||
## Target Dev Topology
|
||||
|
||||
The first dev environment runs in namespace `unidesk-dev` on D601:
|
||||
|
||||
- `postgres-dev`: independent PostgreSQL StatefulSet or equivalent persistent database for dev.
|
||||
- `backend-core-dev`: backend-core built from the commit id declared in `deploy/dev:deploy.json`.
|
||||
- `frontend-dev`: frontend built from the commit id declared in `deploy/dev:deploy.json`, proxying only to `backend-core-dev`.
|
||||
- `code-queue-mgr-dev`: lightweight Code Queue control plane using the dev database.
|
||||
- `code-queue-read-dev`, `code-queue-write-dev`, `code-queue-scheduler-dev`: Code Queue k3s execution components using dev database, dev logs, dev state paths, and dev Code Queue settings.
|
||||
- Optional first-access path: SSH port-forward or a private D601-hosted ingress. Public exposure is not required for phase 1.
|
||||
|
||||
All dev services must report environment identity in `/health`:
|
||||
|
||||
- `environment=dev`
|
||||
- namespace
|
||||
- database name
|
||||
- service id
|
||||
- GitHub repo and commit id
|
||||
- deployment ref, expected to be `origin/deploy/dev`
|
||||
|
||||
## Core Isolation Rules
|
||||
|
||||
1. Dev services must use `unidesk-dev` namespace only.
|
||||
2. Dev services must use a dev PostgreSQL instance or database. They must not connect to production PostgreSQL.
|
||||
3. Dev provider identity must be separate, for example `D601-dev`; it must not reuse production `D601` provider id or provider token.
|
||||
4. Dev Code Queue tasks, queues, attempts, notifications, and trace state must not write production tables unless table names are explicitly namespaced and verified safe. The preferred first version is a separate dev database.
|
||||
5. Dev manifests must not mount production deployment roots such as `/root/unidesk` on the main server or production D601 deployment paths unless the mount is read-only and explicitly needed for diagnostics.
|
||||
6. Dev Code Queue must use dev work directories, dev log directories, and dev state directories.
|
||||
7. Production deploy must not read a local dirty `deploy.json`; production deploy must read the production desired state from the configured GitHub environment ref.
|
||||
8. LLM/Code Queue development tasks should only receive dev deploy credentials by default.
|
||||
|
||||
## Deploy Manifest Model
|
||||
|
||||
Use one schema for environment manifests:
|
||||
|
||||
```json
|
||||
{
|
||||
"schemaVersion": 1,
|
||||
"environment": "dev",
|
||||
"services": [
|
||||
{
|
||||
"id": "backend-core",
|
||||
"repo": "https://github.com/pikasTech/unidesk",
|
||||
"commitId": "<commit>"
|
||||
},
|
||||
{
|
||||
"id": "frontend",
|
||||
"repo": "https://github.com/pikasTech/unidesk",
|
||||
"commitId": "<commit>"
|
||||
},
|
||||
{
|
||||
"id": "code-queue",
|
||||
"repo": "https://github.com/pikasTech/unidesk",
|
||||
"commitId": "<commit>"
|
||||
},
|
||||
{
|
||||
"id": "code-queue-mgr",
|
||||
"repo": "https://github.com/pikasTech/unidesk",
|
||||
"commitId": "<commit>"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Environment-to-ref mapping must be fixed in code or canonical config:
|
||||
|
||||
- `dev` maps to `origin/deploy/dev`.
|
||||
- `prod` maps to `origin/deploy/prod`.
|
||||
|
||||
The deploy command should accept an environment, not an arbitrary branch for production. A debug or admin-only command may inspect arbitrary refs, but normal prod deployment must use the fixed mapping.
|
||||
|
||||
## Phase 0: Design And Guardrails
|
||||
|
||||
Purpose: make the target behavior explicit before adding a second runtime.
|
||||
|
||||
Implementation items:
|
||||
|
||||
- Define the environment manifest schema and validation rules.
|
||||
- Add `environment` to deploy manifests and reject mismatches.
|
||||
- Define fixed environment mappings: `dev -> deploy/dev`, `prod -> deploy/prod`.
|
||||
- Document target namespace, database, provider identity, and service ids for dev.
|
||||
- Add CLI dry-run planning output that prints:
|
||||
- selected environment
|
||||
- GitHub ref
|
||||
- resolved manifest commit
|
||||
- services and commit ids
|
||||
- target namespace
|
||||
- target database fingerprint
|
||||
- target provider identity
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- `deploy plan --env dev` can read and validate a dev manifest without mutating the cluster.
|
||||
- `deploy plan --env prod` can read and validate a prod manifest without using the local worktree `deploy.json`.
|
||||
- A manifest with `environment=prod` must be rejected for `--env dev`, and the reverse must also be rejected.
|
||||
|
||||
## Phase 1: GitHub Environment Branch Deploy Source
|
||||
|
||||
Purpose: make GitHub desired-state refs the deploy source of truth.
|
||||
|
||||
Implementation items:
|
||||
|
||||
- Create or initialize `deploy/dev` with a valid `deploy.json`.
|
||||
- Create or initialize `deploy/prod` with a valid `deploy.json`.
|
||||
- Add CLI support to fetch an environment ref and read `deploy.json` from that ref.
|
||||
- Keep the existing local `deploy.json` path as a compatibility mode only for explicit local/admin workflows.
|
||||
- Ensure commit ids listed by the manifest exist in their declared repos.
|
||||
- Ensure dev/prod deploy does not depend on a dirty local working tree.
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- `deploy plan --env dev` reads `origin/deploy/dev:deploy.json`.
|
||||
- `deploy plan --env prod` reads `origin/deploy/prod:deploy.json`.
|
||||
- Changing local `deploy.json` does not affect `--env dev` or `--env prod`.
|
||||
- The plan output includes the Git ref and manifest blob/commit used.
|
||||
|
||||
## Phase 2: D601 Dev Namespace And Database
|
||||
|
||||
Purpose: create the minimum isolated substrate for dev backend and Code Queue state.
|
||||
|
||||
Implementation items:
|
||||
|
||||
- Add a k8s manifest for namespace `unidesk-dev`.
|
||||
- Add dev PostgreSQL StatefulSet/Service/PVC or an equivalent persistent DB.
|
||||
- Add dev DB init and migration flow for backend-core and Code Queue tables.
|
||||
- Add dev secrets/config:
|
||||
- database credentials
|
||||
- provider token
|
||||
- auth/session secret
|
||||
- Code Queue model secrets if needed
|
||||
- Add resource requests/limits so dev DB cannot starve D601 production k3s workloads.
|
||||
|
||||
Technical decisions:
|
||||
|
||||
- Prefer a separate dev PostgreSQL instance over sharing production PostgreSQL with a different database name. It gives the clearest failure boundary.
|
||||
- If a shared PostgreSQL server is temporarily used, the CLI and services must hard-check database name and connection target before startup.
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- `kubectl -n unidesk-dev get pods,svc,pvc` shows the dev DB ready.
|
||||
- Dev DB survives Pod restart.
|
||||
- Dev services cannot accidentally connect to the production database URL without failing startup validation.
|
||||
|
||||
## Phase 3: backend-core-dev And frontend-dev
|
||||
|
||||
Purpose: make a usable UniDesk dev control surface independent from production main server Compose.
|
||||
|
||||
Implementation items:
|
||||
|
||||
- Add k8s manifests for `backend-core-dev` and `frontend-dev`.
|
||||
- Build images from the commit ids declared in `deploy/dev:deploy.json`.
|
||||
- Inject dev-only config into backend-core:
|
||||
- `UNIDESK_ENV=dev`
|
||||
- dev `MICROSERVICES_JSON`
|
||||
- dev database URL
|
||||
- dev provider token
|
||||
- dev log paths
|
||||
- Inject frontend config so it proxies to `backend-core-dev`, not production backend-core.
|
||||
- Add service health and readiness probes.
|
||||
- Expose dev frontend through port-forward or a private dev ingress.
|
||||
|
||||
Technical decisions:
|
||||
|
||||
- First version can omit public exposure. Port-forward is acceptable while validating isolation.
|
||||
- Dev frontend must have a visible DEV environment marker to avoid operator confusion.
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- Dev backend-core `/health` returns ok and includes `environment=dev`.
|
||||
- Dev frontend `/health` returns ok and proxies only to dev backend-core.
|
||||
- Production `bun scripts/cli.ts server status` remains healthy while dev backend/frontend are redeployed.
|
||||
- Rebuilding dev backend/frontend does not touch main server Docker Compose containers.
|
||||
|
||||
## Phase 4: code-queue-mgr-dev
|
||||
|
||||
Purpose: provide the dev queue management and submission path without writing production Code Queue tables.
|
||||
|
||||
Implementation items:
|
||||
|
||||
- Add k8s manifest for `code-queue-mgr-dev`.
|
||||
- Configure it to use the dev database only.
|
||||
- Configure dev backend-core service catalog so stable dev `code-queue` control/read paths route to `code-queue-mgr-dev`.
|
||||
- Ensure `code-queue-mgr-dev` can submit, list, summarize, and update dev queue state.
|
||||
- Add health output proving:
|
||||
- role is master-control-plane or dev-control-plane
|
||||
- database is dev
|
||||
- schema is ready
|
||||
- no runner dependencies
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- Dev UI/CLI can submit a dry-run or queued task to the dev DB.
|
||||
- Production Code Queue task list is unchanged by dev submissions.
|
||||
- Dev `code-queue-mgr-dev` memory footprint remains within the lightweight control-plane budget.
|
||||
|
||||
## Phase 5: code-queue-dev Execution Components
|
||||
|
||||
Purpose: run dev Code Queue execution inside `unidesk-dev` without interfering with production Code Queue.
|
||||
|
||||
Implementation items:
|
||||
|
||||
- Add dev variants of Code Queue manifests:
|
||||
- `code-queue-read-dev`
|
||||
- `code-queue-write-dev`
|
||||
- `code-queue-scheduler-dev`
|
||||
- Configure all dev components to use dev database, dev logs, and dev state paths.
|
||||
- Use dev service names and labels so production k3s adapter does not confuse dev and prod services.
|
||||
- Decide whether first version supports real Codex execution or smoke-only execution.
|
||||
- If real execution is enabled:
|
||||
- isolate workdir paths
|
||||
- isolate Codex/OpenCode XDG/state paths
|
||||
- isolate notifications
|
||||
- cap concurrency and memory
|
||||
- avoid writing production OA Event Flow unless explicitly configured for dev
|
||||
|
||||
Technical decisions:
|
||||
|
||||
- First version should default to smoke/dry-run execution unless real task execution is needed immediately.
|
||||
- If real task execution is enabled, use a dev-specific queue prefix or dev database and disable production ClaudeQQ notifications by default.
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- Dev Code Queue `/health` returns ok and includes `environment=dev`.
|
||||
- Dev scheduler can pick up a dev queued task and move it through a terminal state.
|
||||
- Restarting dev scheduler does not affect production running tasks.
|
||||
- Production `code-queue` health remains healthy during dev Code Queue rollout.
|
||||
|
||||
## Phase 6: Dev Deploy Apply
|
||||
|
||||
Purpose: make `deploy/dev:deploy.json` drive the dev environment end to end.
|
||||
|
||||
Implementation items:
|
||||
|
||||
- Add `deploy apply --env dev`.
|
||||
- For each service in the dev manifest:
|
||||
- fetch declared repo and commit
|
||||
- build image on D601 or through the established target-side build path
|
||||
- tag image with environment and commit
|
||||
- apply the dev k8s manifest
|
||||
- wait for rollout
|
||||
- verify live commit from `/health` or Deployment annotation
|
||||
- Ensure deployment records include environment, ref, service id, commit id, image tag, namespace, and rollout status.
|
||||
- Add `deploy status --env dev` or equivalent drift check.
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- Updating `deploy/dev:deploy.json` to a new commit and running `deploy apply --env dev` updates dev backend-core/frontend/code-queue components.
|
||||
- Live `/health` commit matches the manifest commit.
|
||||
- No production Deployment, Service, Secret, PVC, DB table, or Docker Compose container is mutated by dev deploy.
|
||||
|
||||
## Phase 7: Prod Deploy Ref Compatibility
|
||||
|
||||
Purpose: let production read desired state from `deploy/prod` while keeping production runtime unchanged.
|
||||
|
||||
Implementation items:
|
||||
|
||||
- Add `deploy plan --env prod` and `deploy apply --env prod` using `origin/deploy/prod:deploy.json`.
|
||||
- Keep production target executors as they are initially:
|
||||
- main server Compose for production backend-core/frontend and direct sidecars
|
||||
- D601 k3s for production Code Queue execution
|
||||
- Enforce production command guardrails:
|
||||
- canonical root only
|
||||
- production credentials only on main server
|
||||
- manifest must say `environment=prod`
|
||||
- target namespace and provider identity must match production
|
||||
- Branch protection for `deploy/prod` is recommended but can be added after the first version.
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- Production deploy no longer depends on local `deploy.json`.
|
||||
- Production deploy reports the exact Git ref and manifest commit used.
|
||||
- Production deploy still validates live commit after rollout.
|
||||
|
||||
## Phase 8: Operator And LLM Safety
|
||||
|
||||
Purpose: reduce environment confusion for LLM agents and humans.
|
||||
|
||||
Implementation items:
|
||||
|
||||
- Add clear CLI output for every deploy:
|
||||
- environment
|
||||
- ref
|
||||
- namespace
|
||||
- DB fingerprint
|
||||
- provider id
|
||||
- services and commits
|
||||
- Add explicit DEV marker in dev frontend.
|
||||
- Add hard startup checks:
|
||||
- dev service refuses production DB
|
||||
- dev service refuses production provider id/token
|
||||
- prod service refuses dev namespace/DB
|
||||
- Ensure LLM task containers receive dev deploy credentials by default and do not receive prod credentials.
|
||||
- Add smoke checks that intentionally try unsafe combinations and verify they fail.
|
||||
|
||||
Acceptance criteria:
|
||||
|
||||
- Running a dev service with production DB config fails before listening.
|
||||
- Running prod deploy from a non-canonical context fails.
|
||||
- LLM/Code Queue default environment can deploy dev but cannot deploy prod without the separate production credential path.
|
||||
|
||||
## Risks And Mitigations
|
||||
|
||||
- Risk: namespace isolation does not isolate node-level CPU, memory, Docker socket, hostPath, or containerd load.
|
||||
- Mitigation: resource requests/limits, separate dev workdirs, no production path mounts, and bounded Code Queue concurrency.
|
||||
- Risk: dev Code Queue accidentally writes production task tables.
|
||||
- Mitigation: separate dev DB, startup DB fingerprint checks, and health output showing DB identity.
|
||||
- Risk: dev frontend appears to be prod or proxies to prod backend-core.
|
||||
- Mitigation: visible DEV marker, `CORE_INTERNAL_URL` hardwired to dev service, and proxy target health checks.
|
||||
- Risk: deploy command accidentally reads local manifest instead of GitHub environment ref.
|
||||
- Mitigation: `--env` mode must read remote ref only and report the ref/blob used.
|
||||
- Risk: D601 k3s control plane failure affects both dev and production k3s workloads.
|
||||
- Mitigation: accept this in phase 1; consider a separate physical/node-level dev cluster only after namespace isolation proves insufficient.
|
||||
- Risk: branch `deploy/prod` is initially unprotected.
|
||||
- Mitigation: even before branch protection, production deploy should still require canonical main server credentials and should report the ref used for audit.
|
||||
|
||||
## Suggested Implementation Order
|
||||
|
||||
1. Phase 0 and Phase 1: establish GitHub environment branch desired-state and dry-run planning.
|
||||
2. Phase 2 and Phase 3: create dev namespace, dev DB, backend-core-dev, and frontend-dev.
|
||||
3. Phase 4 and Phase 5: add dev Code Queue control and execution components.
|
||||
4. Phase 6: make `deploy apply --env dev` deploy the full first dev stack by commit id.
|
||||
5. Phase 7: migrate production deploy to `deploy/prod`.
|
||||
6. Phase 8: harden operator and LLM safety checks.
|
||||
|
||||
The first milestone is complete when `deploy apply --env dev` can deploy backend-core, frontend, code-queue-mgr, and Code Queue read/write/scheduler into `unidesk-dev` from commit ids declared in `origin/deploy/dev:deploy.json`, and repeated dev redeploys do not change production main server status or production Code Queue state.
|
||||
Reference in New Issue
Block a user