fix: make CI install observable
This commit is contained in:
@@ -9,7 +9,7 @@ UniDesk CI is hosted on the D601 native k3s cluster with Tekton Pipelines and Te
|
||||
- UniDesk CI namespace: `unidesk-ci`.
|
||||
- Manifests: `src/components/microservices/k3sctl-adapter/k3s/ci/`.
|
||||
- Artifact catalog: root `CI.json`, which is CI artifact catalog only. It describes build inputs, image naming and summary fields; runtime topology, rollout target, ports, namespaces and desired service commits remain in `config.json`, service manifests and `deploy.json`.
|
||||
- CLI entry: `bun scripts/cli.ts ci install|status|run|publish-backend-core|publish-user-service|run-dev-e2e|logs`.
|
||||
- CLI entry: `bun scripts/cli.ts ci install|install-status|status|run|publish-backend-core|publish-user-service|run-dev-e2e|logs`.
|
||||
- Dev namespace e2e runner: `bun scripts/cli.ts ci run-dev-e2e`; authoritative runner path, manifest contract and safety boundary are in `docs/reference/dev-ci-runner.md`.
|
||||
- Rust backend-core check/build boundary: CI may run `UNIDESK_D601_RUST_CHECK=1 bun scripts/cli.ts check --full --rust` on D601; the master server must not compile Rust for backend-core iteration. The authoritative dev environment rule is `docs/reference/dev-environment.md`.
|
||||
|
||||
@@ -27,7 +27,9 @@ Each commit CI run performs:
|
||||
|
||||
CI/CD bootstrap, repair and upgrade actions are infrastructure operations. They are manually tested and may be promoted directly to production when the infrastructure itself is the target; do not add CI jobs whose purpose is to prove that CI/CD can bootstrap or repair itself.
|
||||
|
||||
`ci install` also prewarms the D601 k3s containerd runtime with the Tekton entrypoint/workingdir helper images, `oven/bun:1-debian`, `alpine/git:2.45.2` and `unidesk-code-queue:dev`. Missing images are pulled through the node-local provider-gateway WS egress proxy and then imported into native k3s containerd with digests preserved, so PipelineRun pods do not hang on external registry pulls. Sustained pull throughput below 1 MB/s is treated as a provider/main-server network or proxy degradation first, not as a Dockerfile or application failure.
|
||||
`ci install` is fire-and-forget by default: it creates a `.state/jobs` job and immediately returns `job.id`, stdout/stderr paths and `ci install-status <jobId|latest>`. Use `--wait` only for explicit synchronous debugging. `ci install-status` must show bounded log tails and `ci.install.progress` stages for prewarm, Tekton install, manifest upload, `kubectl apply` and final status, so a stalled install is diagnosable without waiting on a silent foreground command.
|
||||
|
||||
`ci install` also prewarms the D601 k3s containerd runtime with the Tekton entrypoint/workingdir helper images, `oven/bun:1-debian`, `alpine/git:2.45.2` and `unidesk-code-queue:dev`. Missing images are pulled through the node-local provider-gateway WS egress proxy and then imported into native k3s containerd with digests preserved, so PipelineRun pods do not hang on external registry pulls. Sustained pull throughput below 1 MB/s is treated as a provider/main-server network or proxy degradation first, not as a Dockerfile or application failure. When Tekton is already installed and only UniDesk CI manifests/triggers need refreshing, use `ci install --skip-prewarm --skip-tekton-install`; manifest apply still reports byte counts and per-manifest `kubectl-apply` progress through `install-status`.
|
||||
|
||||
Git clone and dependency downloads inside the repo check task use `d601-provider-egress-proxy.unidesk.svc.cluster.local:18789`; the NO_PROXY list keeps the in-cluster read service and D601 TCP egress gateway on the cluster network.
|
||||
|
||||
|
||||
@@ -90,7 +90,7 @@ CI/CD、GitOps、rollout、artifact 发布、PR 合并后的 runtime lane 滚动
|
||||
- `gh pr merge` 的 already-merged 终态是 guarded merge 的幂等成功例外:当目标 PR 已经处于 `merged` 状态时,命令返回 `ok=true`、`alreadyMerged=true`、`pullRequest.merged=true` 和 merge commit 摘要,不再把并发 monitor、GitHub UI 或人工合并后的 `closed` 状态误报为 validation-failed。
|
||||
- `gh pr list` 与 `gh issue list` 一样接受单个位置参数 `owner/repo` 作为 `--repo owner/repo` 兼容别名;位置 repo 与显式 `--repo` 冲突时会结构化失败,输出里的 `repo` 始终反映真实请求目标。`--number N --repo owner/repo` 是单 PR/comment 数字目标命令的位置参数兼容别名,适用于 `view/read/files/diff/preflight/closeout/edit/update/comment create/comment delete/close/reopen/merge`,成功输出必须带 `standardSyntaxHint`;comment delete 中的 `--number` 表示 commentId,不是 PR number;`list/create` 不能静默忽略 `--number`。
|
||||
- PR dry-run/probe 的最小手动序列是:`bun scripts/cli.ts gh auth status --repo pikasTech/unidesk` 只读检查 token 来源、GitHub REST egress、repo 可见性和 issue read;`bun scripts/cli.ts gh pr create --repo pikasTech/unidesk --title <title> --body-stdin --base master --head <head> --dry-run <<'EOF' ... EOF` 检查创建计划;`bun scripts/cli.ts gh pr list --repo pikasTech/unidesk --state open --limit 5 --json number,title,state,url,head,base`、`bun scripts/cli.ts gh pr files <number> --repo pikasTech/unidesk --limit 30`、`bun scripts/cli.ts gh pr view <number> --repo pikasTech/unidesk --json body,title,state,stateDetail,closed,closedAt,merged,mergedAt,mergeCommit,head,base,headRefName,baseRefName,mergeable,mergeStateStatus,statusCheckRollup` 和 `bun scripts/cli.ts gh pr preflight <number> --repo pikasTech/unidesk` 做只读 PR 观察、文件统计和收口元数据检查;`bun scripts/cli.ts gh pr edit <number> --repo pikasTech/unidesk --title <title> --body-stdin --dry-run <<'EOF' ... EOF` 检查低噪声 PR 标题/正文编辑计划;`bun scripts/cli.ts gh pr comment create <number> --repo pikasTech/unidesk --body-stdin --dry-run <<'EOF' ... EOF` 检查评论计划;`bun scripts/cli.ts gh pr merge <number> --repo pikasTech/unidesk --dry-run` 检查 guarded merge plan,真实 merge 只能在任务边界明确允许且 preflight ready 后执行。Code Queue runner 可用 `bun scripts/code-queue-pr-preflight-example.ts --repo pikasTech/unidesk --base master --head <head> --comment-pr <number>` 一次性跑只读 auth status 与 PR create/comment dry-run;该脚本不得输出 token 值,也不会创建、评论或 merge PR。
|
||||
- `ci install|status|run|publish-backend-core|publish-user-service|run-dev-e2e|logs` 管理 D601 原生 k3s 上的 Tekton CI。`run` 手动创建每 commit 检查和 Code Queue 只读性能门禁;`publish-backend-core` 与 `publish-user-service` 从 pushed Git commit 构建并发布 `127.0.0.1:5000/unidesk/<service>:<commit>` commit-pinned artifacts,输出 `artifactSummary`(含 `serviceId`、`sourceCommit`、`sourceRepo`、`dockerfile`、`imageRef`、`tag`、`digest`、`digestRef`),但不部署生产;`run-dev-e2e` 的 Git 控制 runner、短 launcher、host fetch 边界、临时 smoke namespace 和 no-CD 规则只在 `docs/reference/dev-ci-runner.md` 定义;Tekton CI 通用规则见 `docs/reference/ci.md`。
|
||||
- `ci install|install-status|status|run|publish-backend-core|publish-user-service|run-dev-e2e|logs` 管理 D601 原生 k3s 上的 Tekton CI。`install` 默认创建 `.state/jobs` 异步 job 并立即返回,`install-status <jobId|latest>` 读取阶段化 progress 和 bounded log tail;只有现场同步调试才显式加 `--wait`。`run` 手动创建每 commit 检查和 Code Queue 只读性能门禁;`publish-backend-core` 与 `publish-user-service` 从 pushed Git commit 构建并发布 `127.0.0.1:5000/unidesk/<service>:<commit>` commit-pinned artifacts,输出 `artifactSummary`(含 `serviceId`、`sourceCommit`、`sourceRepo`、`dockerfile`、`imageRef`、`tag`、`digest`、`digestRef`),但不部署生产;`run-dev-e2e` 的 Git 控制 runner、短 launcher、host fetch 边界、临时 smoke namespace 和 no-CD 规则只在 `docs/reference/dev-ci-runner.md` 定义;Tekton CI 通用规则见 `docs/reference/ci.md`。
|
||||
- `schedule list|get|runs|run|retry-run|delete|upsert-pgdata-backup` 管理 backend-core 定时任务和运行历史。`schedule list`、`schedule get`、`schedule runs --limit N` 和 `schedule runs <scheduleId> --limit N` 是只读观察入口;`schedule run`、`schedule retry-run`、`schedule delete` 和 `schedule upsert-pgdata-backup` 会触发运行或写入配置,生产恢复时必须有明确授权。`schedule runs --limit N` 是全局历史视图,返回 `scope=global` 和 `scheduleId=null`;`schedule runs <scheduleId> --limit N` 是指定 schedule 历史视图,返回 `scope=schedule` 和对应 `scheduleId`。CLI 必须拒绝 `schedule runs 50` 这类纯数字位置参数,并提示使用 `schedule runs --limit 50`,避免把空数组误判成“没有历史 run”。`schedule run <id> --wait-ms N` 触发同一 schedule,并且即使 wait 超时也必须返回 `newRunId` 和 `observeCommand`;`schedule retry-run <failedRunId>` 只接受 failed run,从原 run 反查 `scheduleId` 后重触发同一 schedule,并输出 `originalRunId`、`scheduleId`、`newRunId` 和 `observeCommand`。当 backend-core 目标容器缺失或只观察到 verify-only 容器时,schedule/microservice 命令必须以非零退出并返回 `failureKind=target-stack-not-running`、`runnerDisposition=infra-blocked`、`readOnlyCommands` 和 `authorizationRequiredForRecovery`,不得把 Docker 的 `No such container` 当成成功的空历史。
|
||||
- `codex deploy <commitId>` 是旧 Code Queue 兼容部署入口,已禁用以防止维护通道直连 D601 部署 Code Queue;当前 dev 自动化只做 `ci run-dev-e2e` smoke,不提供 Code Queue CD,详细规则见 `docs/reference/codex-deploy.md`。
|
||||
- `agentrun get|describe|events|logs|result|ack|cancel|dispatch|create|apply|steer|send` 是当前指挥官新任务和 AgentRun session 控制入口。UniDesk CLI 是 render-only client:客户端保留 k8s 风格命令解析、human 表格、生命周期摘要、下一步命令、分页、`-o json|yaml` 稳定客户端 schema 和错误展示;AgentRun 服务端只提供稳定 RESTful API、鉴权和业务事实,不承载 UniDesk CLI 渲染。日常查看用 `get tasks --queue commander`、`describe task/<taskId>`、`events run/<runId>`、`logs session/<sessionId>`、`result run/<runId> --command <commandId>`;日常写入用 `create task --aipod Artificer --prompt-stdin`、`apply -f -`、`dispatch task/<taskId>`、`steer/send session/<sessionId>`、`ack/cancel task|session/<id>`。兼容 group `queue|runs|commands|runner|sessions|aipod-specs` 也走同一 direct HTTP transport,`--raw` 只披露直连 AgentRun REST envelope。
|
||||
|
||||
Reference in New Issue
Block a user