docs: record actual runner cicd debugging

2026-05-24 08:10:11 +00:00
parent a83923cac4
commit e54493d3de
1 changed files with 18 additions and 0 deletions
@@ -8,6 +8,24 @@ This document defines the stable split between CI artifact producers, artifact c

 D601 CI/CD must target native k3s only. Docker Desktop Kubernetes has been disabled and must not be reintroduced; the incident and governance plan are tracked in [GitHub issue #138](https://github.com/pikasTech/unidesk/issues/138), with recovery context in [GitHub issue #118](https://github.com/pikasTech/unidesk/issues/118). CI producer, Tekton, deploy, artifact-registry and manual recovery scripts must not rely on default kubeconfig. They must export `KUBECONFIG=/etc/rancher/k3s/k3s.yaml`, verify node `d601`, and fail fast if the actual target context/server/nodes indicate `docker-desktop`, `desktop-control-plane`, or `127.0.0.1:11700`. A stale default kubeconfig may be reported as a diagnostic, but it is not a blocker when the explicit D601 kubeconfig passes.

+## Actual Runner Debugging
+
+CI/CD debugging must verify the path from the environment that actually owns the failing step. The master server is a control plane and observability entrypoint; it is not automatically the same network, filesystem, credential or kubeconfig perspective as a Code Queue runner, Tekton task, D601 host process, registry helper or k3s workload. When a rollout, artifact lookup, credential mount, worktree, or registry check behaves differently from the commander's view, the next diagnostic step is to enter the real runner container or pod and reproduce the official CLI path there.
+
+The standard method is:
+
+1. Identify the active execution owner from scheduler/task metadata, not from a convenient local clone.
+2. Enter that exact container or pod through the documented maintenance path, then run the repo-owned CLI from its worktree.
+3. Prove the control plane explicitly with `KUBECONFIG=/etc/rancher/k3s/k3s.yaml`, target namespace, current context, node `d601`, and service endpoints before mutation.
+4. Run at least one artifact producer action and one CD consumer action when validating a full CI/CD path; checking only build, only registry, or only live health is not enough.
+5. Record bounded job status, duration, blocker fields and final live identity. Full logs belong in job artifacts or issue comments, not in long-lived reference docs.
+
+Every CLI used in this lane must preserve progressive disclosure: `submit` returns a job id quickly, while `status`, `logs` and `report` read bounded state. Long Docker builds, registry pushes, `kubectl wait`, rollout status and e2e smoke tests must not occupy the commander's foreground context as a blocking command.
+
+Network identity is part of the evidence. `127.0.0.1:5000` may mean runner-pod loopback to a Node HTTP client, D601 host loopback to Docker, or node-local loopback to k3s/containerd. A fix must preserve the release artifact identity while using the correct verification route for that runtime view; it must not rewrite `deploy.json`, image references or manifests merely to satisfy one observer's loopback.
+
+Manual hotfixes are allowed only to recover and learn the missing contract. The durable close-out is a repo-owned preflight, desired-state check, CLI behavior, or reference update, followed by the same verification from the actual runner environment. A healthy old live service does not prove CD success; success requires the formal job to pass and live commit or digest identity to match the desired state.
+
 ## Target Shape

 The standard release shape is: