CI/CD Branch Follower

SPEC: PJ2026-01060703 CI/CD branch follower draft-2026-07-03-p0-branch-follower

Entrypoints

bun scripts/cli.ts cicd branch-follower plan
bun scripts/cli.ts cicd branch-follower apply --confirm --wait
bun scripts/cli.ts cicd branch-follower status
bun scripts/cli.ts cicd branch-follower status --live
bun scripts/cli.ts cicd branch-follower run-once --all --dry-run
bun scripts/cli.ts cicd branch-follower run-once --follower <id> --confirm --wait
bun scripts/cli.ts cicd branch-follower events --follower <id>
bun scripts/cli.ts cicd branch-follower logs --follower <id>

apply --confirm --wait is the one-command deploy/update entry for the K8s controller. status is the default intermediate-state query. status --live and local run-once submit a bounded K8s reconcile Job; the Job performs all source, Tekton, Argo and runtime reads inside the cluster and may write only the compact state summary. events and logs are read-only drill-downs for the same Kubernetes-native state. run-once --confirm --wait is the manual one-command trigger and closeout path.

Source Authority

Follower decisions must not read host source worktrees, target dev directories, .worktree/*, local git state, or direct GitHub branch refs.
Controller pods use EmptyDir plus the YAML-declared k8s git-mirror cache PVC, sync GitHub refs from inside Kubernetes, clone UniDesk controller source from /cache, then run the CLI with the mounted registry.
Runtime source commits, build contexts, publish inputs and closeout status remain owned by each adapter's k8s git-mirror snapshot and runtime objects.
Trigger adapters communicate through the Kubernetes API with the controller service account. Formal triggering, observation and closeout must not depend on downstream CLI stdout parsing, host worktrees, or operator shell state.
Dirty, stale, or missing-dependency host worktrees are non-authoritative and must not change observed sha, trigger sha, PipelineRun, GitOps, or status output.
trans or SSH may be used only by the operator CLI as a transport to create/read Kubernetes objects on the target cluster. It must not be part of branch-follower source sync, GitHub communication, status collection, decision making or closeout.

YAML Ownership

config/cicd-branch-followers.yaml owns controller settings and the follower registry: id, adapter, source/target configRefs, command argv, native status object refs, closeout check labels and budgets.

It must not copy runtime/GitOps/Secret details from owning configs:

HWLAB node lanes: config/hwlab-node-lanes.yaml
AgentRun lanes: config/agentrun.yaml
Web sentinel profiles/scenarios/reports/secrets: config/hwlab-web-probe-sentinel/*.yaml

Use configRef summaries in plan/status; do not create a full.md or super Markdown index.

First Followers

hwlab-jd01-v03: follows pikasTech/HWLAB@v0.3, adapter hwlab-node-runtime, native trigger Tekton PipelineRun -> Argo Application closeout -> runtime Deployment sourceCommit readiness.
agentrun-jd01-v02: follows pikasTech/agentrun@v0.2, adapter agentrun-yaml-lane, native trigger build image Job -> GitOps publish Job -> git-mirror flush Job -> Tekton PipelineRun -> Argo Application closeout -> runtime Deployment sourceCommit readiness. The same source commit must use deterministic Job names so a later controller loop can resume or reuse already completed stages.
web-probe-sentinel-master: follows pikasTech/unidesk@master, adapter web-probe-sentinel-cicd, native trigger Tekton PipelineRun -> Argo Application closeout -> runtime Deployment sourceCommit readiness.

These three followers are the initial production set. HWLAB and AgentRun both run on JD01; there is no D601 target in the automatic follower set unless YAML is explicitly changed.

Reuse And Mirror Contract

The controller must preserve the runtime reuse capabilities that already exist in the runtime lanes:

runtime reuse: if both code identity and env identity are unchanged for a microservice, skip rebuild and rollout for that service;
env reuse: if code changed but env identity is unchanged, reuse the previous environment image and publish only the changed service artifact;
git mirror: source sync, immutable source snapshot creation and GitOps flush are generic branch-follower stages, not adapter-local afterthoughts.

Adapters should expose reuse evidence through compact native state. HWLAB uses the plan-artifacts task event summary (affectedServices, buildServices, reusedServices, artifactProvenanceAudit). AgentRun publishes deterministic image/GitOps/git-mirror stage names and source-commit labels so a later loop can resume closeout without rebuilding completed stages. Sentinel keeps the same source/CI/Argo/runtime contract but has no GitOps branch flush gate.

The normal convergence budget is 120 seconds per source change. A follower may report ClosingOut while waiting for Argo/runtime readiness, but it must not report Noop when the source sha matches and required native gates such as git-mirror flush are still incomplete.

Status Contract

Default status output must show follower id, phase, adapter, source branch + observed sha, target sha, last triggered sha, last succeeded sha, in-flight job/PipelineRun, budget source, timing summary and next drill-down commands.

Stage timing must be queryable through normal CLI output, not only raw JSON. status and run-once print a bounded STAGE TIMINGS table with total, status-read, git-mirror, Kubernetes Job, PipelineRun, TaskRun, Argo, runtime and closeout rows when available. followers[].timings remains available in --raw/JSON for machine consumers.

timings.totalSeconds is the authoritative end-to-end wall-clock measurement for a triggered run: measure from timings.startedAt until timings.finishedAt, or until query time while closeout is still running. Do not compute total by summing stage rows, because stage rows can overlap, omit external waiting, or be reported by different native objects.

Do not backfill, infer, or migrate old branch-follower state when historical timing, stage timing, or other observability fields are missing or known to be unreliable. Compatibility starts with future state written by the current controller; old missing data must render as -/unknown in CLI output instead of being recovered from unrelated native objects.

If a deterministic Kubernetes Job or PipelineRun is reused and there is no already-stored timings.startedAt, the reused object's current wait/check duration is only a stage observation; it must not be promoted to timings.totalSeconds.

When run-once --confirm --wait resumes a source change that is already ClosingOut, the CLI may wait for native closeout and report a closeout stage duration. That closeout-only wait is not the end-to-end total unless the stored state already contains a valid timings.startedAt.

State machine phases are Observed, Noop, PendingTrigger, Triggering, ClosingOut, Succeeded, Failed, Superseded, Blocked, and Skipped.

Status and decision inputs are Kubernetes-native:

source: k8s git-mirror cache ref and immutable snapshot ref;
CI: Tekton PipelineRun.status.conditions;
CI drill-down: compact TaskRun timings and plan-artifact reuse summary when available;
git mirror: source snapshot readiness plus GitOps pendingFlush/githubInSync when the follower owns a GitOps branch;
deployment: Argo Application.status.sync and Application.status.health;
runtime: selected Deployment/StatefulSet readiness plus source commit labels, annotations or env.

The branch follower must not parse downstream CLI stdout/stderr, kubectl human tables, argo text, tkn text, or curl output to infer observed sha, target sha, readiness or closeout. kubectl -o json may be used inside the controller/Job as a structured Kubernetes API transport only.

The controller automatic loop submits trigger work without a blocking wait; later loops close out via the native state objects above. Failed state must not dedupe a source commit forever: retries may reuse deterministic native objects for the same source commit, and a new compact observation should be able to move the follower back into triggering or closeout.

State ConfigMaps must stay bounded and human-queryable. Store compact summaries, stage refs, conditions, short messages, and drill-down object names; do not store full API payloads or long log dumps. Cleanup is an explicit operator operation for stale/broken state and must not be required for normal convergence.

run-once --dry-run is read-only for deployment: it may refresh the state ConfigMap with current native observations, but it must not trigger adapters.

8.6 KiB Raw Blame History