From 08da0805bbfab629a84f7fd28c304edba6ea0011 Mon Sep 17 00:00:00 2001 From: Codex Date: Thu, 4 Jun 2026 14:44:02 +0000 Subject: [PATCH] fix: make hwlab v02 cd latest-only --- docs/reference/cli.md | 8 +- docs/reference/g14.md | 2 +- scripts/hwlab-g14-contract-test.ts | 29 ++++- scripts/src/hwlab-g14.ts | 185 +++++++++++------------------ 4 files changed, 103 insertions(+), 121 deletions(-) diff --git a/docs/reference/cli.md b/docs/reference/cli.md index 269d657e..27e0cf75 100644 --- a/docs/reference/cli.md +++ b/docs/reference/cli.md @@ -47,13 +47,13 @@ CI/CD、GitOps、rollout、artifact 发布、PR 合并后的 DEV/PROD 滚动、P - `artifact-registry plan|render|status|health|install|deploy-backend-core|deploy-service` 管理 D601 host-managed CNCF Distribution registry 的声明、安装、只读检查和 pull-only artifact CD。该 registry 固定为 D601 loopback `127.0.0.1:5000`,由 systemd + Docker Compose 管理,位于 native k3s 故障域外;`deploy-service` 只拉取 CI 已发布的 commit-pinned 镜像、retag/recreate 或导入 native k3s,并做 live commit 验证,不构建 runtime source。`deploy-backend-core` 是 deprecated 兼容名,标准 backend-core prod CD 入口是 `deploy apply --env prod --service backend-core`。长期规则见 `docs/reference/artifact-registry.md`。 - `commander contract|plan --dry-run|smoke --dry-run|approval request --dry-run|prompt-lint --kind gpt55-pr` 是 host Codex 指挥官直管微服务 skeleton 入口。当前命令返回 `phase=source-contract`、service/API/state/bridge/prompt/trace/#20/#46/ClaudeQQ 审批边界、.state/commander/ 状态模型、dev 无 daemon smoke contract、dry-run 计划和 GPT-5.5 PR prompt 边界辅助 lint,不接 live bridge、不注入 prompt、不发送 ClaudeQQ。`approval request --dry-run` 会生成 200 字以内中文纯文本 ClaudeQQ 审批草案、`notification-path-unavailable` blocker 和授权后唯一可用的 `bun scripts/cli.ts microservice proxy claudeqq /api/push/text --method POST --body-json '' --raw` 命令;不得提示使用本机 ClaudeQQ skill、powershell 或本地 server。`prompt-lint` 支持 `--prompt-file` 与 `--stdin`,输出 `ok`、`missingClauses`、`riskLevel`、`suggestedPatchSnippet` 且不回显完整 prompt;它是 commander 辅助检查,不是业务 PR 门禁,也不改变 `codex submit` 默认行为。`plan`、`smoke` 与 `approval request` 必须带 `--dry-run`;缺少时返回 `error=dry-run-required`。长期规则见 `docs/reference/host-codex-commander.md`。 - `hwlab g14 monitor-prs [--lane g14|v02] [--once] [--dry-run] [--interval-seconds N] [--max-cycles N] [--timeout-seconds N]` 是当前 HWLAB G14 PR -> CI/CD -> DEV rollout 的一行式入口。普通调用创建 `.state/jobs/` 异步 job 并立刻返回 `job.id`、`statusCommand` 和 stdout/stderr 路径;后台 worker 每轮通过 UniDesk `gh pr list/preflight/merge` 监控 `pikasTech/HWLAB` base=`G14` 的 open PR,ready 时合并,然后通过 UniDesk `trans G14:k3s` 观察 `hwlab-g14-ci-poll-`、Argo `hwlab-g14-dev` 和 DEV `/health/live`,直到 DEV `Synced/Healthy` 且 Deployment/StatefulSet ready;历史 `Completed` smoke/debug pod 不作为 rollout blocker。每次成功 DEV rollout 后,worker 会定位或创建 #7“指挥简报索引”中的北京日期每日简报 issue,并追加 CI/CD 耗时、CI/CD 关键指标、语义化上线 changelog、自动 diff 摘要、PipelineRun、GitOps revision 和 DEV 验证摘要;关键指标来自 G14 Tekton TaskRun results,固定包含 `lazy build reused: x/y`、reused services、rebuild services 和每个 service 的独立耗时/状态/backend,用于观察 lazy build 机制效果。语义化 changelog 优先从 PR body 的 `## 修改`/`## 变更`/`## Changelog` 等段落提取,diff 摘要只作为文件和统计证据保留,不替代 changelog。也可用 `hwlab g14 record-rollout --pr --source-commit ` 手动补记,手动补记同样会按 PipelineRun 采集 TaskRun 指标。G14 状态指针按用途分离:长期监控只写 `.state/hwlab-g14/latest-monitor-job.json`,`--once` 写 `latest-once-job.json`,`--dry-run` 写 `latest-dry-run-job.json`,`--once --dry-run` 写 `latest-once-dry-run-job.json`,避免一次性收口覆盖持续监控入口。`--once --dry-run` 只做单轮监控和 merge plan,不写 GitHub、不等待 rollout。该命令禁止使用原生 `gh` 或手拼 GitHub 请求;如果 UniDesk `gh` 子命令字段或行为不够,必须先改进 `scripts/src/gh.ts` 后再使用。 -- `hwlab g14 monitor-prs --lane v02` 是 HWLAB `v0.2` 的 PR -> CI -> CD 自动化入口。它只监控 base=`v0.2` 的 open PR:每轮先用 UniDesk `gh pr preflight` 读取 GitHub CI/checks、mergeability 和冲突状态;pending 时在 PR 下写等待评论,blocked/conflict 时写阻塞评论;ready 时先确认 v0.2 lane 没有运行中的 PipelineRun,再用 UniDesk `gh pr merge` 合并,随后执行受控 `control-plane trigger-current --lane v02 --confirm --wait`、轮询定点 `control-plane status --lane v02 --source-commit `,必要时执行 `git-mirror flush --confirm --wait`。不管 CD 成功、失败或超时,都在原 PR 下用 `gh pr comment create --body-file` 追加语义化状态,正文固定包含起止时间、总耗时、冲突状态、CI/preflight conclusion、source commit、PipelineRun、targetValidation、Argo/webAssets 和 git mirror pendingFlush/githubInSync。评论去重状态写入 `.state/hwlab-g14/v02-pr-comment-signatures.json`,同一状态签名不会重复刷评论;v0.2 monitor 指针使用 `.state/hwlab-g14/latest-v02-monitor-job.json`、`latest-v02-once-job.json`、`latest-v02-dry-run-job.json` 和 `latest-v02-once-dry-run-job.json`,不会覆盖默认 G14 monitor 指针。`--lane v02 --once --dry-run` 只做单轮 preflight/merge/CD/comment plan,不写 GitHub、不触发 CD。 +- `hwlab g14 monitor-prs --lane v02` 是 HWLAB `v0.2` 的 PR -> CI -> CD 自动化入口。它只监控 base=`v0.2` 的 open PR:每轮先用 UniDesk `gh pr preflight` 读取 GitHub CI/checks、mergeability 和冲突状态;pending 时在 PR 下写等待评论,blocked/conflict 时写阻塞评论;ready 时直接用 UniDesk `gh pr merge` 合并,不因为其他 commit 的运行中 PipelineRun 阻塞 merge 或 CI 启动。合并后执行受控 `control-plane trigger-current --lane v02 --confirm --wait`、轮询定点 `control-plane status --lane v02 --source-commit `,必要时执行 `git-mirror flush --confirm --wait`。v0.2 CD 采用 latest-only:旧 PipelineRun 不取消、不等待,但 promotion 写 `v0.2-gitops` 前必须重新确认 source head,stale commit 只能以 superseded/no-op 收口,不能回滚 runtime。不管 CD 成功、superseded、失败或超时,都在原 PR 下用 `gh pr comment create --body-file` 追加语义化状态,正文固定包含起止时间、总耗时、冲突状态、CI/preflight conclusion、source commit、PipelineRun、targetValidation、Argo/webAssets 和 git mirror pendingFlush/githubInSync。评论去重状态写入 `.state/hwlab-g14/v02-pr-comment-signatures.json`,同一状态签名不会重复刷评论;v0.2 monitor 指针使用 `.state/hwlab-g14/latest-v02-monitor-job.json`、`latest-v02-once-job.json`、`latest-v02-dry-run-job.json` 和 `latest-v02-once-dry-run-job.json`,不会覆盖默认 G14 monitor 指针。`--lane v02 --once --dry-run` 只做单轮 preflight/merge/CD/comment plan,不写 GitHub、不触发 CD。 - `agentrun v01 control-plane status|trigger-current|refresh [--dry-run|--confirm]` 是 AgentRun `v0.1` 在 G14 k3s 的受控 Tekton/Argo 入口。`status` 只读汇总固定 source worktree commit、对应 commit-pinned PipelineRun、GitOps latest、Argo Application、`agentrun-v01` manager source commit、`planArtifacts.summary`、env image result 和 git mirror 摘要,并报告 manager/Argo/GitOps 是否对齐当前 source commit。默认输出是 compact commander 视图:`summary` 给出 source、PipelineRun、Argo、manager image、git mirror 和 `aligned` 结论;`timings` 给出 `sourceMs`、`runtimeMs`、`gitMirrorMs` 和 `totalMs`;远端 stdout/stderr tail 默认省略,失败时仍展开必要 tail,完整 tail 用 `--full`,原始 git mirror cache 用 `--raw`。`status` 聚合 source 后会并行读取 runtime 和 git mirror,并向 stderr 输出 `agentrun.control-plane.status.progress` JSON 事件,覆盖 `source`、`runtime`、`git-mirror` 的 started/succeeded/failed 和 elapsedMs,避免 10s 以上状态聚合期间无可见进展;`trigger-current` 先快进 `G14:/root/agentrun-v01` 到 `origin/v0.1`,检查 `devops-infra` mirror 的 `localV01` 是否等于目标 source commit,必要时先执行受控 mirror sync,再创建 `agentrun-v01-ci-` PipelineRun。confirmed trigger 只提交 CI/CD 工作并返回后续 `status` 命令,不等待完整 PipelineRun;同名 PipelineRun 运行中或已成功时拒绝重复触发,只允许失败态重建或首次创建。`refresh` 只对 `argocd/agentrun-g14-v01` 执行 hard refresh,用于 GitOps promotion 已完成但 Argo 仍停留旧 revision 时的受控同步入口;它不直接 patch runtime workload。AgentRun 运行时和 SPEC 事实来源仍在 AgentRun 仓库,UniDesk 只维护受控运维入口。 - `agentrun v01 git-mirror status|sync|flush [--dry-run|--confirm]` 是 AgentRun `v0.1` 使用 `devops-infra` git mirror/relay 的受控维护入口。`status` 默认返回 read/write URL、`localV01`、`githubV01`、`localGitops`、`githubGitops`、`pendingFlush`、`githubInSync` 和 exact full-SHA shallow fetch 摘要,不默认展开完整 cache stdout;需要探测 tail 时用 `--full`,需要原始 cache 输出时用 `--raw`。`sync` 创建 manual Job,把 GitHub `v0.1` 和 `v0.1-gitops` refs 拉入 `/cache/pikasTech/agentrun.git`;`flush` 把本地 `v0.1-gitops` 快进推回 GitHub。confirmed `sync`/`flush` 默认创建 `.state/jobs/` 异步 job 并立刻返回 `job.id`、`statusCommand` 和日志路径;只有现场同步调试才显式加 `--wait`。该入口与 HWLAB v0.2 mirror 共用 `devops-infra` 服务和 cache PVC,但 repo path、refs、status 文件和 CLI 命令彼此独立。 - `hwlab g14 control-plane status|apply --lane v02 [--dry-run|--confirm]` 是 HWLAB `v0.2` 加法 lane 的受控 Tekton/Argo 控制面维护入口,source commit 只来自 G14 专用 bare repo `/root/hwlab-v02-cicd.git` 的 `refs/remotes/origin/v0.2`;`/root/hwlab-v02` 只作为人工开发和短连接源码工具 workspace 被观测,dirty/stale 状态必须输出为 isolated warning 而不能阻塞 CI/CD。该入口面向 branch `v0.2`、namespace `hwlab-ci` 和 Argo application `hwlab-g14-v02`;默认 `status` 只读汇总最新 source head 的 pipeline、RBAC/ServiceAccount、Argo、当前 commit PipelineRun、当前 PipelineRun 的 TaskRun 条件摘要、最近 PipelineRun 摘要、活跃 PipelineRun、遗留 v02 CronJob 清理状态、commit alignment,以及 19666/19667 的 Cloud Web 静态资源和 API live 探针。分支被后续提交推进后,要复查已完成 run 时使用 `status --lane v02 --pipeline-run hwlab-v02-ci-poll-`;已知完整 source SHA 但不想依赖最新 head 时使用 `status --lane v02 --source-commit `。定点 `status` 输出 `statusTarget.mode` 和 `targetValidation`,只检查指定 PipelineRun/source commit 的证据;`targetValidation.state=passed` 表示该目标已满足 PipelineRun succeeded、Argo `Synced/Healthy`、19666/19667 探针、Git mirror flushed,并且该 run 的 `planArtifacts.rolloutServices` 运行时 source commit 对齐;`planArtifacts.reusedServices` 作为 runtime/provenance 证据呈现,但不能被强制要求等于目标 source commit。`targetValidation.state=superseded` 表示该目标已成功且 runtime 已被同一分支后续成功 PipelineRun 取代,`falseGreenGuard` 在该状态下应标为 superseded/not-applicable。两种状态都不得因为 `origin/v0.2` 后续推进而把历史 run 判为失败;默认不带定点参数时仍严格判定最新 source head alignment。TaskRun 摘要的 `performance` 字段会把超过 120s 的 build TaskRun 标为慢任务、超过 180s 标为 critical warning,用于暴露 env reuse/git mirror 命中率回归,但不作为阻断门禁;CI/CD 性能验收应同时看 `planArtifacts.summary`、`taskRuns.performance.warningCount` 和 PipelineRun duration,纯 CLI/文档或无 runtime 重建需求的后续提交应稳定表现为 `build=0 reuse=` 且无 build TaskRun warning,首次引入或切换 env image 时允许只构建必要 env image 一次。`webAssets` 必须直接给出 `readonly-rpc` 删除、sidebar/workspace/event panel 关键 CSS、`/app.js` 是否可读取和字节数、`/health/live` 与 API revision;`apiRevision` 是 cloud-api 服务自身 revision,Cloud Web 静态资源变更时允许它与 source commit 不同,不能把这种差异误判成 Cloud Web 未发布。默认只读取必要字段,禁止把完整 PipelineRun spec、Tekton 内联脚本、历史大对象或整份 CSS/HTML/JS 展开到默认输出;`apply` 先自动 fetch `/root/hwlab-v02-cicd.git` 并从 commit-pinned detached worktree 执行 render check,再经 `G14:k3s` server-side apply `tekton-v02/rbac.yaml`、`pipeline.yaml`、`argocd/project.yaml` 和 `argocd/application-v02.yaml`,confirmed apply 会删除遗留 v02 CronJob,但不会应用 runtime-v02 workload、Secret 或数据迁移。 -- `hwlab g14 control-plane trigger-current --lane v02 [--dry-run|--confirm]` 是 v02 标准手动触发入口:先自动 fetch `/root/hwlab-v02-cicd.git`,解析当前 `origin/v0.2` full SHA,创建 commit-pinned `hwlab-v02-ci-poll-` PipelineRun;读 Git 走 `git-mirror-http.devops-infra.svc.cluster.local`,GitOps promotion 写 `git-mirror-write.devops-infra.svc.cluster.local`;confirmed trigger 在删除/创建 PipelineRun 前会先按当前 source commit 在 G14 临时 detached worktree 中 render,再 server-side apply v02 Tekton RBAC、Pipeline 与 Argo Application,避免 CI/CD 脚本或 runtime-ready 逻辑已合并但集群仍执行旧 Pipeline 定义;该 render 不要求固定 `/root/hwlab-v02` 工作树 clean,也不得因 `.worktree/` 或其他并行未提交修改阻塞;同名 PipelineRun 成功或运行中时拒绝重复触发,失败或不存在时才删除旧对象并重新创建。 +- `hwlab g14 control-plane trigger-current --lane v02 [--dry-run|--confirm]` 是 v02 标准手动触发入口:先自动 fetch `/root/hwlab-v02-cicd.git`,解析当前 `origin/v0.2` full SHA,创建 commit-pinned `hwlab-v02-ci-poll-` PipelineRun;读 Git 走 `git-mirror-http.devops-infra.svc.cluster.local`,GitOps promotion 写 `git-mirror-write.devops-infra.svc.cluster.local`;confirmed trigger 在创建 PipelineRun 前会先按当前 source commit 在 G14 临时 detached worktree 中 render,再 server-side apply v02 Tekton RBAC、Pipeline 与 Argo Application,避免 CI/CD 脚本或 runtime-ready 逻辑已合并但集群仍执行旧 Pipeline 定义;该 render 不要求固定 `/root/hwlab-v02` 工作树 clean,也不得因 `.worktree/` 或其他并行未提交修改阻塞;同名 PipelineRun 存在时默认复用现有状态,不删除重建,失败 run 的重试策略必须显式设计,不能恢复默认 delete/create。 创建 PipelineRun 前会读取 `devops-infra` mirror refs,若 `localV02` 未等于当前 source commit,则自动执行一次受控 manual `git-mirror sync` Job 并复核 ref,复核失败时停止触发,避免 Tekton `prepare-source` 已知失败;services 参数只包含 v02 runtime service matrix,`hwlab-cli` 是固定 repo 短连接源码工具,不进入 PipelineRun service build。 - `--dry-run` 只报告是否会 pre-sync,不创建 Job;confirmed trigger 默认创建 `.state/jobs/` 异步 job 并立刻返回 `job.id`、`statusCommand`、stdout/stderr 路径,避免 git mirror pre-sync 或 PipelineRun 创建期间长时间阻塞;`--wait` 路径也必须向 stderr 输出 `hwlab.v02.trigger.progress` JSON 事件,覆盖 `control-plane-refresh`、`git-mirror-pre-sync`、`delete-existing-pipelinerun` 和 `create-pipelinerun`,避免异步 job 长时间只有启动命令而无法判断卡点;默认 JSON 必须对 `manifest_b64`、长脚本和远端 stdout/stderr 做有界摘要,保留长度与 hash,最终 trigger 结果只返回阶段摘要和关键 tail,完整内容通过 job stdout/stderr 文件渐进披露;只有现场同步调试才显式加 `--wait`;旧 `rerun-current` 只作为输入别名保留。PipelineRun `Completed`、Argo `Synced/Healthy` 和 `webAssets.ok=true` 只证明 G14 runtime 已更新;交付收口还必须用 `hwlab g14 git-mirror status` 查看 `cache.summary.pendingFlush`,若为 true,继续执行受控 `hwlab g14 git-mirror flush --confirm` 并用 job status 轮询到 `pendingFlush=false`。 + `--dry-run` 只报告是否会 pre-sync,不创建 Job;confirmed trigger 默认创建 `.state/jobs/` 异步 job 并立刻返回 `job.id`、`statusCommand`、stdout/stderr 路径,避免 git mirror pre-sync 或 PipelineRun 创建期间长时间阻塞;`--wait` 路径也必须向 stderr 输出 `hwlab.v02.trigger.progress` JSON 事件,覆盖 `control-plane-refresh`、`git-mirror-pre-sync` 和 `create-pipelinerun`,避免异步 job 长时间只有启动命令而无法判断卡点;默认 JSON 必须对 `manifest_b64`、长脚本和远端 stdout/stderr 做有界摘要,保留长度与 hash,最终 trigger 结果只返回阶段摘要和关键 tail,完整内容通过 job stdout/stderr 文件渐进披露;只有现场同步调试才显式加 `--wait`;旧 `rerun-current` 只作为输入别名保留。PipelineRun `Completed`、Argo `Synced/Healthy` 和 `webAssets.ok=true` 只证明 G14 runtime 已更新;交付收口还必须用 `hwlab g14 git-mirror status` 查看 `cache.summary.pendingFlush`,若为 true,继续执行受控 `hwlab g14 git-mirror flush --confirm` 并用 job status 轮询到 `pendingFlush=false`。 - `hwlab g14 control-plane runtime-migration --lane v02 [--dry-run|--allow-live-db-read --dry-run|--confirm]` 只通过 `hwlab-v02` namespace 当前 `deployment/hwlab-cloud-api -c hwlab-cloud-api` 内 repo-owned migration CLI 执行;不读取或打印 Secret 值、不触碰 PROD、不绕到手工 `psql`。 - `hwlab g14 secret status|ensure --lane v02 --name hwlab-v02-device-pod-api-key --key api-key [--dry-run|--confirm]` 是 HWLAB v0.2 runtime SecretRef bootstrap 的标准入口,用于确保 `deploy/deploy.json` 中 `HWLAB_DEVICE_POD_API_KEY=secretRef:hwlab-v02-device-pod-api-key/api-key` 对应的 Kubernetes Secret 存在。`status` 只返回 secret/key 是否存在和解码后的字节数;`ensure --dry-run` 只报告会创建还是保持;`ensure --confirm` 在 G14 k3s 侧生成随机值并 server-side apply Secret。该命令永远不读取、不打印、不回传 secret 明文,也不提供手工值注入、fallback session token 或临时 lease 路径。 - `hwlab g14 control-plane cleanup-runs --lane v02|g14|all [--min-age-minutes N] [--limit N] [--dry-run|--confirm]` 是完成态 PipelineRun 工作区 retention 入口;真实清理只删除已完成 PipelineRun,让 Tekton/local-path 回收临时 PVC,不触碰 registry storage、业务 PVC、Secret、runtime workload 或 GitOps desired state。 @@ -100,7 +100,7 @@ CI/CD、GitOps、rollout、artifact 发布、PR 合并后的 DEV/PROD 滚动、P - `codex interrupt|cancel ` 通过 Code Queue 私有代理请求中断;running/judging 任务会请求 D601 当前 agent run 停止,queued/retry_wait 任务的取消也必须保持与 WebUI 相同代理路径,返回有界 task 摘要和后续查询命令。任何需要接触 active run 的动作仍属于 D601 执行面。 - Code Queue 多队列 lane 由 `codex` 命令命名空间管理:`queues [--full|--all] [--limit N] [--page N|--offset N]` 列表、`queue create ` 创建、`queue merge --into ` 合并、`move --queue ` 迁移;这些队列管理入口默认由主 server `code-queue-mgr` 直管 PostgreSQL,仍通过稳定 `code-queue` 用户服务代理路径访问。`codex queues` 默认只返回 active/nonempty/unread/runnable queue 摘要、activity、commanderConcurrency、全局 counts 和 execution diagnostics;`--full` 或 `--all` 只切换为完整队列行视图的一页,仍受 `--limit`/`--page`/`--offset` 分页约束,不再默认携带 deprecated full array。summary 和 full 的稳定机读路径都是 `.data.queues.items[]`,全局元数据固定在 `.data.queues.commanderConcurrency`、`.data.queues.activity`、`.data.queues.counts`、`.data.queues.executionDiagnostics`、`.data.queues.activeTaskIds` 和 `.data.queues.queuedTaskIds`;需要完整 upstream 时使用输出中的 raw command。`commanderConcurrency.activeRunnerCount` / `activity.effectiveActiveTaskCount` 是指挥官并发判断的有效活跃数,`schedulerLocalActiveQueueCount`/`activeQueueIds` 只描述本地 scheduler active-run slots,不能覆盖数据库 running 计数或 heartbeat-fresh runner 计数。旧 full 顶层数组语义已作为 deprecated 兼容信息记录,不再作为 `.data.queues` 主形态。同一个 queue 内部串行执行,不同 queue 之间并行执行。迁移只允许尚未被 scheduler claim 的 `queued`/`retry_wait` 任务,必须满足 `startedAt=null`、`currentAttempt=0` 且没有 active thread/turn;已进入 `running`/`judging` 或已有 claim 标记的任务返回 409,不得被 move/merge 回写成 queued。合并会移动可迁移任务归属并自动删除源 queue 记录,只保留合并后的目标 queue;若 source 或 target queue 存在 active/claimed 任务,合并整体返回 409。合并后的目标 queue 按任务原 `queueEnteredAt`/`createdAt` 时间顺序串行,成功迁移 queued/retry_wait 任务后由 D601 scheduler 轮询推进。 - 所有 `codex` 查询和管理命令必须走与 WebUI 相同的 backend-core 私有代理路径 `/api/microservices/code-queue/proxy/...`;CLI 不得为了提交、移动、中断、取消或队列管理直接调用 D601 内部 Service、数据库、pod curl 或 k3sctl scheduler 子服务。若该路径失败,应先修复 CLI/backend/provider tunnel 链路,而不是绕过控制面。 -- `job list [--limit N] [--include-command]` 与 `job status [--tail-bytes N]` 查询 `.state/jobs/` 文件系统状态,是异步命令的可观测入口。`job list` 默认只返回最新 50 条摘要,并为已知异步工作流返回轻量 `progress.summary` 与后续查询命令;`job status` 默认返回结构化 `progress`、stdout/stderr 末尾 12000 字节、`tailPolicy` 与完整日志路径。已知工作流应从有界日志尾部抽取阶段、关键对象名和下一步命令,避免为了判断当前阶段而手工打开完整 stdout/stderr。`hwlab_g14_v02_trigger_current` 的 progress 必须暴露 trigger 阶段、source commit 和 PipelineRun;`hwlab_g14_v02_pr_monitor` 的 progress 必须暴露 preflight、merge、source-head、lane-idle、cd-trigger、cd-status、git-mirror-flush 和 pr-comment 阶段,以及 PR、source commit、PipelineRun、targetValidation/pendingFlush 摘要;`hwlab_g14_git_mirror_sync|flush` 与 `agentrun_v01_git_mirror_sync|flush` 的 progress 必须暴露 sync/flush 状态、Job 名、pendingFlush 与 fetch/push/total/SSH timing,并给出对应 repo 的 mirror status 命令。 +- `job list [--limit N] [--include-command]` 与 `job status [--tail-bytes N]` 查询 `.state/jobs/` 文件系统状态,是异步命令的可观测入口。`job list` 默认只返回最新 50 条摘要,并为已知异步工作流返回轻量 `progress.summary` 与后续查询命令;`job status` 默认返回结构化 `progress`、stdout/stderr 末尾 12000 字节、`tailPolicy` 与完整日志路径。已知工作流应从有界日志尾部抽取阶段、关键对象名和下一步命令,避免为了判断当前阶段而手工打开完整 stdout/stderr。`hwlab_g14_v02_trigger_current` 的 progress 必须暴露 trigger 阶段、source commit 和 PipelineRun;`hwlab_g14_v02_pr_monitor` 的 progress 必须暴露 preflight、merge、source-head、cd-trigger、cd-status、git-mirror-flush 和 pr-comment 阶段,以及 PR、source commit、PipelineRun、targetValidation/pendingFlush 摘要;`hwlab_g14_git_mirror_sync|flush` 与 `agentrun_v01_git_mirror_sync|flush` 的 progress 必须暴露 sync/flush 状态、Job 名、pendingFlush 与 fetch/push/total/SSH timing,并给出对应 repo 的 mirror status 命令。 - `debug health`、`debug dispatch` 与 `debug task` 走真实内部 core、WebSocket、数据库、provider、系统指标、Docker 状态和 Host SSH 维护桥流程,只用于开发调试,不写入 `TEST.md` 的正式验收步骤。 - `e2e run [--only pattern[,pattern...]] [--skip pattern[,pattern...]]` 使用 publicHost 派生的公开 production frontend/dev frontend/provider ingress URL,并通过 Docker 内网验证 core API、PostgreSQL、provider self-connection、系统指标曲线、Docker 状态快照、provider.upgrade 预检和 Playwright 前端页面,是交付前的自动化 E2E 门禁;CLI 默认输出 check 状态摘要,完整诊断写入 `resultPath`,日常迭代应优先用 `--only` / `--skip` 跑最小必要集合。 diff --git a/docs/reference/g14.md b/docs/reference/g14.md index 1fc5dcc7..0ca2d02e 100644 --- a/docs/reference/g14.md +++ b/docs/reference/g14.md @@ -78,7 +78,7 @@ bun scripts/cli.ts hwlab g14 control-plane status --lane v02 --pipeline-run hwla bun scripts/cli.ts hwlab g14 control-plane status --lane v02 --source-commit ``` -Targeted status must expose `statusTarget.mode` and `targetValidation`. `targetValidation.state=passed` means the requested PipelineRun/source commit reached a succeeded PipelineRun, Argo `Synced/Healthy`, public web/API probes, flushed Git mirror, and matching runtime source commits for the services listed in that run's `planArtifacts.rolloutServices`; services listed in `planArtifacts.reusedServices` remain visible as runtime/provenance evidence but must not be forced to the target source commit. `targetValidation.state=superseded` means the requested PipelineRun succeeded and was later replaced in runtime by a newer succeeded `v0.2` PipelineRun; this is valid closure evidence for the requested run when the newer commit is on the same branch lineage. In both states, `commitAlignment.staleReasons` may still mention later `origin/v0.2` or CI/CD source head movement; that is parallel-head context, not a failure of the requested run. `falseGreenGuard` is a current-runtime guard and should report not-applicable/superseded for such historical targets instead of turning later runtime movement into a false failure. Default status without a target remains strict for the latest source head. +Targeted status must expose `statusTarget.mode` and `targetValidation`. `targetValidation.state=passed` means the requested PipelineRun/source commit reached a succeeded PipelineRun, Argo `Synced/Healthy`, public web/API probes, flushed Git mirror, and matching runtime source commits for the services listed in that run's `planArtifacts.rolloutServices`; services listed in `planArtifacts.reusedServices` remain visible as runtime/provenance evidence but must not be forced to the target source commit. `targetValidation.state=superseded` means the requested PipelineRun succeeded but no longer owns runtime: either it was replaced by a newer succeeded `v0.2` PipelineRun, or latest-only promotion observed that `origin/v0.2` had advanced before GitOps/runtime writeback and closed the historical run as no-op. This is valid closure evidence for the requested run when the newer commit is on the same branch lineage. In both states, `commitAlignment.staleReasons` may still mention later `origin/v0.2` or CI/CD source head movement; that is parallel-head context, not a failure of the requested run. `falseGreenGuard` is a current-runtime guard and should report not-applicable/superseded for such historical targets instead of turning later runtime movement into a false failure. Default status without a target remains strict for the latest source head. For HWLAB user-feedback, CLI, Cloud Web, AgentRun, device-pod, public API, or runtime workflow issues, source-level validation is not enough to close the issue. Unit tests, contract tests, `git diff --check`, targeted build checks, PR merge metadata, and source commit rollout evidence are supporting evidence only. The issue may be closed only after the affected user entry or original entry has been exercised against the target runtime. For CLI issues, that means running the relevant `hwlab-cli` or UniDesk-controlled CLI command from the G14 `v0.2` workspace or approved execution plane against the intended lane/URL/namespace and proving the observed behavior, not just proving the helper code compiles. For Cloud Web or public API issues, use the public endpoint or a bounded API/asset smoke that reaches the deployed runtime. For AgentRun or device-pod issues, capture the trace/session/thread/run/job/device evidence that proves the specific continuation or hardware workflow reached the live backend. diff --git a/scripts/hwlab-g14-contract-test.ts b/scripts/hwlab-g14-contract-test.ts index 730e5dd0..2ffc51d9 100644 --- a/scripts/hwlab-g14-contract-test.ts +++ b/scripts/hwlab-g14-contract-test.ts @@ -1,4 +1,4 @@ -import { gitMirrorFlushJobManifest, gitMirrorStatusSummary, gitMirrorSyncJobManifest, gitMirrorV02SyncRequirement, hwlabG14Help, hwlabG14MonitorStateFileName, parseGitMirrorStatusRefs, parsePipelineTaskRunMetrics, rolloutRecordBody, semanticChangelogBullets, v02CommitAlignment, v02ControlPlaneRenderScript, v02FalseGreenGuard, v02PipelineServiceIds, v02PrAutomationCommentBody, v02TaskRunPerformanceSummary } from "./src/hwlab-g14"; +import { gitMirrorFlushJobManifest, gitMirrorStatusSummary, gitMirrorSyncJobManifest, gitMirrorV02SyncRequirement, hwlabG14Help, hwlabG14MonitorStateFileName, parseGitMirrorStatusRefs, parsePipelineTaskRunMetrics, rolloutRecordBody, semanticChangelogBullets, v02CommitAlignment, v02ControlPlaneRenderScript, v02FalseGreenGuard, v02LatestOnlyTargetValidation, v02PipelineServiceIds, v02PrAutomationCommentBody, v02TaskRunPerformanceSummary } from "./src/hwlab-g14"; function assertCondition(condition: unknown, message: string, detail: unknown = {}): void { if (!condition) throw new Error(`${message}: ${JSON.stringify(detail)}`); @@ -40,8 +40,9 @@ assertCondition( ); assertCondition( hwlabHelpUsage.some((line) => line.includes("monitor-prs --lane v02")) - && JSON.stringify(hwlabG14Help()).includes("v02-pr-comment-signatures.json"), - "v0.2 PR monitor help must expose the auto CI/CD lane and dedupe comment state", + && JSON.stringify(hwlabG14Help()).includes("v02-pr-comment-signatures.json") + && JSON.stringify(hwlabG14Help()).includes("latest-only"), + "v0.2 PR monitor help must expose the auto CI/CD lane, latest-only CD, and dedupe comment state", hwlabG14Help(), ); @@ -182,6 +183,28 @@ assertCondition( staleSuccessAlignment, ); +const latestOnlySuperseded = v02LatestOnlyTargetValidation({ + targetMode: "source-commit", + sourceCommit: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", + pipelineRun: { exists: true, status: "True", pipelineRun: "hwlab-v02-ci-poll-aaaaaaaaaaaa" }, + commitAlignment: { staleReasons: ["origin-head-mismatch"], originHead: "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb" }, + targetValidation: { + ok: false, + state: "failed", + failures: [{ reason: "runtime-service-source-mismatch", serviceId: "hwlab-cloud-api" }], + }, +}); +assertCondition( + latestOnlySuperseded.ok === true + && latestOnlySuperseded.state === "superseded" + && latestOnlySuperseded.latestOnlySuperseded === true + && Array.isArray(latestOnlySuperseded.failures) + && latestOnlySuperseded.failures.length === 0 + && JSON.stringify(latestOnlySuperseded.supersededFailures).includes("runtime-service-source-mismatch"), + "v0.2 latest-only target validation must close a succeeded stale source commit as superseded instead of failing runtime mismatch", + latestOnlySuperseded, +); + const falseGreenPassed = v02FalseGreenGuard({ sourceCommit: "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", pipelineRun: { exists: true, status: "True" }, diff --git a/scripts/src/hwlab-g14.ts b/scripts/src/hwlab-g14.ts index a32a68ca..7af20bff 100644 --- a/scripts/src/hwlab-g14.ts +++ b/scripts/src/hwlab-g14.ts @@ -1433,6 +1433,36 @@ function v02TargetValidation(input: { }; } +export function v02LatestOnlyTargetValidation(input: { + targetMode: string; + sourceCommit: string | null; + pipelineRun: Record | null; + commitAlignment: Record; + targetValidation: Record; +}): Record { + if (input.targetMode === "latest-source-head") return input.targetValidation; + if (input.sourceCommit === null) return input.targetValidation; + if (input.pipelineRun === null || input.pipelineRun.status !== "True") return input.targetValidation; + const validation = record(input.targetValidation); + if (validation.ok === true || validation.state === "passed") return input.targetValidation; + const staleReasons = stringArray(input.commitAlignment.staleReasons); + const sourceHeadAdvancedReasons = staleReasons.filter((reason) => reason === "origin-head-mismatch" || reason === "cicd-source-repo-stale"); + if (sourceHeadAdvancedReasons.length === 0) return input.targetValidation; + const failures = Array.isArray(validation.failures) ? validation.failures : []; + return { + ...validation, + ok: true, + state: "superseded", + superseded: true, + latestOnlySuperseded: true, + latestOnlyReasons: sourceHeadAdvancedReasons, + originalState: validation.state ?? null, + summary: `target ${input.targetMode} completed for ${shortSha(input.sourceCommit)} and was superseded by a newer v0.2 source head before GitOps/runtime writeback`, + supersededFailures: failures.slice(0, 10), + failures: [], + }; +} + export function v02CommitAlignment(input: { expectedSourceHead: string | null; sourceHeads: Record; @@ -2038,16 +2068,21 @@ function createV02PipelineRun(sourceCommit: string, timeoutSeconds: number): Com `manifest_b64=${shellQuote(manifestB64)}`, `manifest_path=/tmp/${pipelineRun}.json`, "printf '%s' \"$manifest_b64\" | base64 -d > \"$manifest_path\"", - "kubectl create -f \"$manifest_path\"", + "if kubectl create -f \"$manifest_path\"; then", + " :", + "else", + " code=$?", + ` if kubectl get pipelinerun -n ${shellQuote(CI_NAMESPACE)} ${shellQuote(pipelineRun)} >/dev/null 2>&1; then`, + ` printf 'PipelineRun %s already exists; reusing existing object\\n' ${shellQuote(pipelineRun)} >&2`, + " else", + " exit \"$code\"", + " fi", + "fi", `kubectl get pipelinerun -n ${shellQuote(CI_NAMESPACE)} ${shellQuote(pipelineRun)} -o jsonpath='{.metadata.name}{\"\\n\"}{.metadata.labels.hwlab\\.pikastech\\.local/source-commit}{\"\\n\"}{.status.conditions[0].status}{\"\\n\"}{.status.conditions[0].reason}{\"\\n\"}'`, ].join("\n"); return g14K3s(["script", "--", script], timeoutSeconds * 1000); } -function deleteV02PipelineRun(pipelineRun: string): CommandJsonResult { - return g14K3s(["kubectl", "delete", "pipelinerun", "-n", CI_NAMESPACE, pipelineRun, "--ignore-not-found=true"], 60_000); -} - function v02ControlPlaneStatus(target: V02ControlPlaneStatusTarget = {}): Record { const targetMode: V02StatusTargetMode = target.mode ?? (target.pipelineRun !== undefined && target.pipelineRun !== null ? "pipeline-run" : target.sourceCommit !== undefined ? "source-commit" : "latest-source-head"); @@ -2141,7 +2176,7 @@ function v02ControlPlaneStatus(target: V02ControlPlaneStatusTarget = {}): Record runtimeWorkloads, webAssets, }); - const targetValidation = v02TargetValidation({ + const targetValidationBase = v02TargetValidation({ targetMode, sourceCommit, pipelineRun: pipelineRunInfo, @@ -2155,6 +2190,13 @@ function v02ControlPlaneStatus(target: V02ControlPlaneStatusTarget = {}): Record gitMirror, recentPipelineRuns, }); + const targetValidation = v02LatestOnlyTargetValidation({ + targetMode, + sourceCommit, + pipelineRun: pipelineRunInfo, + commitAlignment, + targetValidation: targetValidationBase, + }); const falseGreenGuard = targetValidation.state === "superseded" ? { ok: null, @@ -2310,16 +2352,20 @@ function runV02ControlPlane(options: G14ControlPlaneOptions): Record> { - const started = Date.now(); - let lastStatus: Record = {}; - let activeRuns: Record[] = []; - while (Date.now() - started < timeoutSeconds * 1000) { - lastStatus = v02ControlPlaneStatus(); - activeRuns = activeV02PipelineRuns(lastStatus); - printEvent("v02.cd.lane-idle", { activeCount: activeRuns.length, activeRuns: activeRuns.slice(0, 5) }); - printV02PrMonitorProgress({ stage: "lane-idle", status: activeRuns.length === 0 ? "succeeded" : "running", activeCount: activeRuns.length }); - if (activeRuns.length === 0) { - return { - ok: true, - waitedSeconds: Math.round((Date.now() - started) / 1000), - status: summarizeV02CdStatus(lastStatus), - activeRuns, - }; - } - await sleep(30_000); - } - return { - ok: false, - phase: "active-run-timeout", - timeoutSeconds, - waitedSeconds: Math.round((Date.now() - started) / 1000), - status: summarizeV02CdStatus(lastStatus), - activeRuns, - }; -} - async function runV02PrAutoCd(pr: OpenPullRequest, preflight: Record, merge: CommandJsonResult, options: G14MonitorOptions, startedAt: string): Promise> { const mergeRaceState = isCommandSuccess(merge) ? null : mergeCommandRaceState(merge); if (!isCommandSuccess(merge) && mergeRaceState !== "merged") { @@ -4340,7 +4352,7 @@ async function runV02PrAutoCd(pr: OpenPullRequest, preflight: Record 0 && !v02CdPassed(before)) { - printV02PrMonitorProgress({ stage: "lane-idle", status: "running", pr: pr.number, sourceCommit, pipelineRun, activeCount: activeRuns.length }); - const comment = commentV02PullRequest({ - pr, - phase: "cd-active-run", - state: "cd-blocked", - startedAt, - observedAt: new Date().toISOString(), - elapsedSeconds: durationSeconds(startedAt, new Date().toISOString()), - preflight, - merge: commandData(merge), - sourceCommit, - pipelineRun, - cd: beforeSummary, - dryRun: false, - message: "PR 已合并,但 v0.2 当前已有运行中的 PipelineRun;worker 会等待 lane 空闲后继续触发当前 merge commit 的 CD。", - }); - if (record(comment).ok !== true) return { ok: false, phase: "pr-comment", pr, sourceCommit, pipelineRun, activeRuns, before: beforeSummary, comment }; - const idle = await waitForV02LaneIdle(options.timeoutSeconds); - if (record(idle).ok !== true) { - const timeoutComment = commentV02PullRequest({ - pr, - phase: "cd-active-run-timeout", - state: "cd-timeout", - startedAt, - observedAt: new Date().toISOString(), - elapsedSeconds: durationSeconds(startedAt, new Date().toISOString()), - preflight, - merge: commandData(merge), - sourceCommit, - pipelineRun, - cd: record(idle.status), - dryRun: false, - message: "PR 已合并,但 v0.2 lane 长时间被已有 PipelineRun 占用,当前 merge commit 尚未触发 CD;本评论保留 active run 状态用于接续排障。", - }); - return { ok: false, phase: "cd-active-run-timeout", pr, sourceCommit, pipelineRun, activeRuns, before: beforeSummary, idle, comment, timeoutComment }; - } + if (activeRuns.length > 0) { + printEvent("v02.cd.active-runs-observed", { pr: pr.number, sourceCommit, pipelineRun, activeCount: activeRuns.length, activeRuns: activeRuns.slice(0, 5), latestOnlyPolicy: "do-not-wait-or-cancel-old-runs" }); } const trigger = v02CdPassed(before) ? { ok: true, skipped: true, reason: "source-commit-already-deployed" } : triggerV02Current(Math.min(options.timeoutSeconds, 600)); printEvent("v02.cd.trigger", { pr: pr.number, sourceCommit, pipelineRun, ok: record(trigger).ok, skipped: record(trigger).skipped ?? false, degradedReason: record(trigger).degradedReason ?? null }); - printV02PrMonitorProgress({ stage: "cd-trigger", status: record(trigger).ok === true || record(trigger).degradedReason === "refuse-active-or-successful-pipelinerun" ? "succeeded" : "failed", pr: pr.number, sourceCommit, pipelineRun, skipped: record(trigger).skipped ?? false, degradedReason: record(trigger).degradedReason ?? null }); - if (record(trigger).ok !== true && record(trigger).degradedReason !== "refuse-active-or-successful-pipelinerun") { + printV02PrMonitorProgress({ stage: "cd-trigger", status: record(trigger).ok === true ? "succeeded" : "failed", pr: pr.number, sourceCommit, pipelineRun, activeCount: activeRuns.length, skipped: record(trigger).skipped ?? false, degradedReason: record(trigger).degradedReason ?? null }); + if (record(trigger).ok !== true) { const comment = commentV02PullRequest({ pr, phase: "cd-trigger", @@ -4431,10 +4408,11 @@ async function runV02PrAutoCd(pr: OpenPullRequest, preflight: Record 0) { - const cd = summarizeV02CdStatus(currentStatus); - const comment = commentV02PullRequest({ - pr, - phase: "cd-active-before-merge", - state: "cd-blocked", - startedAt, - observedAt: new Date().toISOString(), - elapsedSeconds: durationSeconds(startedAt, new Date().toISOString()), - preflight, - cd, - dryRun: options.dryRun, - message: "PR 已通过 CI / mergeability,但 v0.2 lane 当前已有运行中的 PipelineRun;为保持 PR merge commit 与 CD 目标一一对应,本轮暂不合并,待 lane 空闲后自动继续。", - }); - observations.push({ pullRequest: pr, preflight, activeRuns, cd, comment }); - printV02PrMonitorProgress({ stage: "pr-comment", status: record(comment).ok === true ? "succeeded" : "failed", pr: pr.number, activeCount: activeRuns.length }); - if (record(comment).ok !== true) return { ok: false, cycle, lane: "v02", phase: "pr-comment", pullRequest: pr, preflight, comment, observations }; - continue; - } const merge = mergePullRequest(pr.number, options.dryRun); printEvent("v02.pr.merge", { cycle, number: pr.number, dryRun: options.dryRun, ok: isCommandSuccess(merge) }); printV02PrMonitorProgress({ stage: "merge", status: isCommandSuccess(merge) ? "succeeded" : "running", pr: pr.number, dryRun: options.dryRun }); @@ -4678,7 +4637,7 @@ export function hwlabG14Help(): Record { "bun scripts/cli.ts hwlab g14 tools-image build --name ci-node-tools --tag node22-alpine-bun-v1 --confirm", "bun scripts/cli.ts job status --tail-bytes 30000", ], - description: "G14 HWLAB PR monitor, DEV rollout command, bounded v0.2 control-plane bootstrap/cleanup/runtime-migration helper, v0.2 runtime SecretRef bootstrap, devops-infra git mirror maintenance, and controlled CI tools image build/status entry. The public monitor starts a fire-and-forget job. Default monitor lane is base=G14; --lane v02 monitors base=v0.2 PRs, waits for GitHub preflight/CI readiness, automatically merges ready PRs, triggers v0.2 CD, flushes the git mirror when needed, and posts deduplicated PR comments for pending, blocked/conflict, success, failure, or timeout states. confirmed control-plane trigger-current and git-mirror sync/flush also return async jobs by default, with --wait reserved for explicit synchronous debugging. control-plane status/apply/cleanup-runs/cleanup-released-pvs/runtime-migration uses UniDesk G14:k3s routes for v0.2 Tekton/Argo control resources, runtime migration, and completed CI workspace retention only. secret status/ensure is the standard v0.2 runtime SecretRef bootstrap path; it never reads or prints secret values. git-mirror status/apply/sync/flush is the manual devops-infra mirror/relay control path and does not install a CronJob.", + description: "G14 HWLAB PR monitor, DEV rollout command, bounded v0.2 control-plane bootstrap/cleanup/runtime-migration helper, v0.2 runtime SecretRef bootstrap, devops-infra git mirror maintenance, and controlled CI tools image build/status entry. The public monitor starts a fire-and-forget job. Default monitor lane is base=G14; --lane v02 monitors base=v0.2 PRs, waits for GitHub preflight/CI readiness, automatically merges ready PRs without waiting for other active v0.2 PipelineRuns, triggers v0.2 CD with latest-only GitOps writeback, flushes the git mirror when needed, and posts deduplicated PR comments for pending, blocked/conflict, success, superseded, failure, or timeout states. confirmed control-plane trigger-current and git-mirror sync/flush also return async jobs by default, with --wait reserved for explicit synchronous debugging. control-plane status/apply/cleanup-runs/cleanup-released-pvs/runtime-migration uses UniDesk G14:k3s routes for v0.2 Tekton/Argo control resources, runtime migration, and completed CI workspace retention only. secret status/ensure is the standard v0.2 runtime SecretRef bootstrap path; it never reads or prints secret values. git-mirror status/apply/sync/flush is the manual devops-infra mirror/relay control path and does not install a CronJob.", defaults: { repo: HWLAB_REPO, base: G14_SOURCE_BRANCH,