docs: clarify hwlab v02 cli rollout closeout

This commit is contained in:
Codex
2026-05-31 14:08:42 +00:00
parent 1af11b1f0c
commit 1761f319e4
5 changed files with 151 additions and 16 deletions
+5 -5
View File
@@ -43,15 +43,15 @@ CI/CD、GitOps、rollout、artifact 发布、PR 合并后的 DEV/PROD 滚动、P
- `artifact-registry plan|render|status|health|install|deploy-backend-core|deploy-service` 管理 D601 host-managed CNCF Distribution registry 的声明、安装、只读检查和 pull-only artifact CD。该 registry 固定为 D601 loopback `127.0.0.1:5000`,由 systemd + Docker Compose 管理,位于 native k3s 故障域外;`deploy-service` 只拉取 CI 已发布的 commit-pinned 镜像、retag/recreate 或导入 native k3s,并做 live commit 验证,不构建 runtime source。`deploy-backend-core` 是 deprecated 兼容名,标准 backend-core prod CD 入口是 `deploy apply --env prod --service backend-core`。长期规则见 `docs/reference/artifact-registry.md`
- `commander contract|plan --dry-run|smoke --dry-run|approval request --dry-run|prompt-lint --kind gpt55-pr` 是 host Codex 指挥官直管微服务 skeleton 入口。当前命令返回 `phase=source-contract`、service/API/state/bridge/prompt/trace/#20/#46/ClaudeQQ 审批边界、.state/commander/ 状态模型、dev 无 daemon smoke contract、dry-run 计划和 GPT-5.5 PR prompt 边界辅助 lint,不接 live bridge、不注入 prompt、不发送 ClaudeQQ。`approval request --dry-run` 会生成 200 字以内中文纯文本 ClaudeQQ 审批草案、`notification-path-unavailable` blocker 和授权后唯一可用的 `bun scripts/cli.ts microservice proxy claudeqq /api/push/text --method POST --body-json '<payload>' --raw` 命令;不得提示使用本机 ClaudeQQ skill、powershell 或本地 server。`prompt-lint` 支持 `--prompt-file``--stdin`,输出 `ok``missingClauses``riskLevel``suggestedPatchSnippet` 且不回显完整 prompt;它是 commander 辅助检查,不是业务 PR 门禁,也不改变 `codex submit` 默认行为。`plan``smoke``approval request` 必须带 `--dry-run`;缺少时返回 `error=dry-run-required`。长期规则见 `docs/reference/host-codex-commander.md`
- `hwlab g14 monitor-prs [--once] [--dry-run] [--interval-seconds N] [--max-cycles N] [--timeout-seconds N]` 是当前 HWLAB G14 PR -> CI/CD -> DEV rollout 的一行式入口。普通调用创建 `.state/jobs/` 异步 job 并立刻返回 `job.id``statusCommand` 和 stdout/stderr 路径;后台 worker 每轮通过 UniDesk `gh pr list/preflight/merge` 监控 `pikasTech/HWLAB` base=`G14` 的 open PRready 时合并,然后通过 UniDesk `ssh G14:k3s` 观察 `hwlab-g14-ci-poll-<short>`、Argo `hwlab-g14-dev` 和 DEV `/health/live`,直到 DEV `Synced/Healthy` 且 Deployment/StatefulSet ready;历史 `Completed` smoke/debug pod 不作为 rollout blocker。每次成功 DEV rollout 后,worker 会定位或创建 #7“指挥简报索引”中的北京日期每日简报 issue,并追加 CI/CD 耗时、CI/CD 关键指标、语义化上线 changelog、自动 diff 摘要、PipelineRun、GitOps revision 和 DEV 验证摘要;关键指标来自 G14 Tekton TaskRun results,固定包含 `lazy build reused: x/y`、reused services、rebuild services 和每个 service 的独立耗时/状态/backend,用于观察 lazy build 机制效果。语义化 changelog 优先从 PR body 的 `## 修改`/`## 变更`/`## Changelog` 等段落提取,diff 摘要只作为文件和统计证据保留,不替代 changelog。也可用 `hwlab g14 record-rollout --pr <number> --source-commit <sha>` 手动补记,手动补记同样会按 PipelineRun 采集 TaskRun 指标。状态指针按用途分离:长期监控只写 `.state/hwlab-g14/latest-monitor-job.json``--once``latest-once-job.json``--dry-run``latest-dry-run-job.json``--once --dry-run``latest-once-dry-run-job.json`,避免一次性收口覆盖持续监控入口。`--once --dry-run` 只做单轮监控和 merge plan,不写 GitHub、不等待 rollout。该命令禁止使用原生 `gh` 或手拼 GitHub 请求;如果 UniDesk `gh` 子命令字段或行为不够,必须先改进 `scripts/src/gh.ts` 后再使用。
- `hwlab g14 control-plane status|apply --lane v02 [--dry-run|--confirm]` 是 HWLAB `v0.2` 加法 lane 的受控 Tekton/Argo 控制面维护入口,只面向 G14 `/root/hwlab-v02`、branch `v0.2`、namespace `hwlab-ci` 和 Argo application `hwlab-g14-v02``status` 只读汇总 pipeline、RBAC/ServiceAccount、Argo、当前 commit PipelineRun、当前 PipelineRun 的 TaskRun 条件摘要、最近 PipelineRun 摘要、活跃 PipelineRun、遗留 v02 CronJob 清理状态,以及 19666/19667 的 Cloud Web 静态资源和 API live 探针。`webAssets` 必须直接给出 `readonly-rpc` 删除、sidebar/workspace/event panel 关键 CSS、`/health/live` API revisionCloud Web 静态资源变更时允许 `apiRevision` 与 source commit 不同,但不得把这种差异误判成 19666 未发布。默认只读取必要字段,禁止把完整 PipelineRun spec、Tekton 内联脚本、历史大对象或整份 CSS/HTML 展开到默认输出;`apply` 先在 G14 workspace 快进并执行 render check,再经 `G14:k3s` server-side apply `tekton-v02/rbac.yaml``pipeline.yaml``argocd/project.yaml``argocd/application-v02.yaml`confirmed apply 会删除遗留 v02 CronJob,但不会应用 runtime-v02 workload、Secret 或数据迁移。
- `hwlab g14 control-plane status|apply --lane v02 [--dry-run|--confirm]` 是 HWLAB `v0.2` 加法 lane 的受控 Tekton/Argo 控制面维护入口,只面向 G14 `/root/hwlab-v02`、branch `v0.2`、namespace `hwlab-ci` 和 Argo application `hwlab-g14-v02``status` 只读汇总 pipeline、RBAC/ServiceAccount、Argo、当前 commit PipelineRun、当前 PipelineRun 的 TaskRun 条件摘要、最近 PipelineRun 摘要、活跃 PipelineRun、遗留 v02 CronJob 清理状态,以及 19666/19667 的 Cloud Web 静态资源和 API live 探针。`webAssets` 必须直接给出 `readonly-rpc` 删除、sidebar/workspace/event panel 关键 CSS、`/app.js` 是否可读取和字节数、`/health/live` API revision`apiRevision` 是 cloud-api 服务自身 revisionCloud Web 静态资源变更时允许与 source commit 不同,不能把这种差异误判成 Cloud Web 未发布。默认只读取必要字段,禁止把完整 PipelineRun spec、Tekton 内联脚本、历史大对象或整份 CSS/HTML/JS 展开到默认输出;`apply` 先在 G14 workspace 快进并执行 render check,再经 `G14:k3s` server-side apply `tekton-v02/rbac.yaml``pipeline.yaml``argocd/project.yaml``argocd/application-v02.yaml`confirmed apply 会删除遗留 v02 CronJob,但不会应用 runtime-v02 workload、Secret 或数据迁移。
- `hwlab g14 control-plane trigger-current --lane v02 [--dry-run|--confirm]` 是 v02 标准手动触发入口:解析当前 `origin/v0.2` full SHA,创建 commit-pinned `hwlab-v02-ci-poll-<short12>` PipelineRun;读 Git 走 `git-mirror-http.devops-infra.svc.cluster.local`GitOps promotion 写 `git-mirror-write.devops-infra.svc.cluster.local`confirmed trigger 在删除/创建 PipelineRun 前会先按当前 source commit 在 G14 临时 detached worktree 中 render,再 server-side apply v02 Tekton RBAC、Pipeline 与 Argo Application,避免 CI/CD 脚本或 runtime-ready 逻辑已合并但集群仍执行旧 Pipeline 定义;该 render 不要求固定 `/root/hwlab-v02` 工作树 clean,也不得因 `.worktree/` 或其他并行未提交修改阻塞;同名 PipelineRun 成功或运行中时拒绝重复触发,失败或不存在时才删除旧对象并重新创建。
创建 PipelineRun 前会读取 `devops-infra` mirror refs,若 `localV02` 未等于当前 source commit,则自动执行一次受控 manual `git-mirror sync` Job 并复核 ref,复核失败时停止触发,避免 Tekton `prepare-source` 已知失败;services 参数只包含 v02 runtime service matrix`hwlab-cli` 是固定 repo 短连接源码工具,不进入 PipelineRun service build。
`--dry-run` 只报告是否会 pre-sync,不创建 Jobconfirmed trigger 默认创建 `.state/jobs/` 异步 job 并立刻返回 `job.id``statusCommand`、stdout/stderr 路径,避免 git mirror pre-sync 或 PipelineRun 创建期间长时间阻塞;`--wait` 路径也必须向 stderr 输出 `hwlab.v02.trigger.progress` JSON 事件,覆盖 `control-plane-refresh``git-mirror-pre-sync``delete-existing-pipelinerun``create-pipelinerun`,避免异步 job 长时间只有启动命令而无法判断卡点;默认 JSON 必须对 `manifest_b64`、长脚本和远端 stdout/stderr 做有界摘要,保留长度与 hash,最终 trigger 结果只返回阶段摘要和关键 tail,完整内容通过 job stdout/stderr 文件渐进披露;只有现场同步调试才显式加 `--wait`;旧 `rerun-current` 只作为输入别名保留。
`--dry-run` 只报告是否会 pre-sync,不创建 Jobconfirmed trigger 默认创建 `.state/jobs/` 异步 job 并立刻返回 `job.id``statusCommand`、stdout/stderr 路径,避免 git mirror pre-sync 或 PipelineRun 创建期间长时间阻塞;`--wait` 路径也必须向 stderr 输出 `hwlab.v02.trigger.progress` JSON 事件,覆盖 `control-plane-refresh``git-mirror-pre-sync``delete-existing-pipelinerun``create-pipelinerun`,避免异步 job 长时间只有启动命令而无法判断卡点;默认 JSON 必须对 `manifest_b64`、长脚本和远端 stdout/stderr 做有界摘要,保留长度与 hash,最终 trigger 结果只返回阶段摘要和关键 tail,完整内容通过 job stdout/stderr 文件渐进披露;只有现场同步调试才显式加 `--wait`;旧 `rerun-current` 只作为输入别名保留。PipelineRun `Completed`、Argo `Synced/Healthy``webAssets.ok=true` 只证明 G14 runtime 已更新;交付收口还必须用 `hwlab g14 git-mirror status` 查看 `cache.summary.pendingFlush`,若为 true,继续执行受控 `hwlab g14 git-mirror flush --confirm` 并用 job status 轮询到 `pendingFlush=false`
- `hwlab g14 control-plane runtime-migration --lane v02 [--dry-run|--allow-live-db-read --dry-run|--confirm]` 只通过 `hwlab-v02` namespace 当前 `deployment/hwlab-cloud-api -c hwlab-cloud-api` 内 repo-owned migration CLI 执行;不读取或打印 Secret 值、不触碰 PROD、不绕到手工 `psql`
- `hwlab g14 control-plane cleanup-runs --lane v02|g14|all [--min-age-minutes N] [--limit N] [--dry-run|--confirm]` 是完成态 PipelineRun 工作区 retention 入口;真实清理只删除已完成 PipelineRun,让 Tekton/local-path 回收临时 PVC,不触碰 registry storage、业务 PVC、Secret、runtime workload 或 GitOps desired state。
- `hwlab g14 control-plane cleanup-released-pvs --lane all [--limit N] [--dry-run|--confirm]` 是 local-path 未自动回收后的补充 retention 入口;只列并删除 `Released``local-path``Delete``claimNamespace=hwlab-ci` 且 claim 名称形如 Tekton 临时 `pvc-*` 的 PV。
- `hwlab g14 git-mirror status|apply|sync|flush [--dry-run|--confirm]``devops-infra` git mirror/relay 的受控维护入口:`apply` 渲染并 server-side apply `devops-infra/git-mirror.yaml`,同时删除遗留 `git-mirror-hwlab-sync` CronJob`sync` 创建一次性 manual Job,把 GitHub allowlist refs 拉入本地 mirror`flush` 创建一次性 manual Job,把本地 `v0.2-gitops` 快进推回 GitHub。
`status` 返回 read/write URL、last sync/write/flush、本地 ref、GitHub staging ref 和 pending flush 状态confirmed `sync``flush` 默认创建 `.state/jobs/` 异步 job 并立刻返回可查询状态,只有现场同步调试才显式加 `--wait`mirror 不设置 CronJob。
`status` 返回 read/write URL、last sync/write/flush、本地 ref、GitHub staging ref 和 pending flush 状态,并在 `cache.summary` 给出 `localV02``localGitops``githubGitops``pendingFlush``flushNeeded``githubInSync` 和下一条受控 `flushCommand`confirmed `sync``flush` 默认创建 `.state/jobs/` 异步 job 并立刻返回可查询状态,只有现场同步调试才显式加 `--wait`mirror 不设置 CronJob。
- `hwlab g14 tools-image status|build --name ci-node-tools --tag <tag> [--dockerfile deploy/ci/hwlab-ci-node-tools.Dockerfile] [--dry-run|--confirm]` 是 G14 固定 HWLAB CI tools image 的受控 host build/push 入口;构建和 push 只发生在 G14 host 与本地 registry,不在 master server 构建,也不把 `apk add`/runtime install 塞进 Tekton PipelineRun。
- `ssh gh:/owner/repo ...` 把 GitHub issue/PR 映射成只读/受控写入的虚拟文本目录,适合日报、PR 正文和 issue 正文的小补丁维护:`ssh gh:/pikasTech/HWLAB ls` 展示 `pr/``issue/``ssh gh:/pikasTech/HWLAB/pr ls [--limit N] [--full]``ssh gh:/pikasTech/HWLAB/issue ls [--limit N] [--full]` 展示条目状态、楼层数、正文长度和标题,`ssh gh:/pikasTech/HWLAB/pr/507 ls` 展示单个 PR 的一楼正文文件,`ssh gh:/pikasTech/HWLAB/505/1 cat|rg|patch-apply` 兼容旧式 issue/PR number route。`patch-apply` 使用 UniDesk 默认 apply-patch v2 的虚拟文件 executor,把正文一楼映射为 `body.md`,写回仍走 `bun scripts/cli.ts gh issue/pr update` 的 guard/concurrency 规则;`rm` 对正文一楼结构化拒绝,避免误删 issue/PR 正文。大正文读取必须展开 UniDesk gh dump 文件,否则 `cat/rg/patch-apply` 会误读为空,这是 `gh:` 虚拟文件接口的 P0 可见性契约。
- `hwlab cd status|audit|preflight|apply --env dev [--dry-run]` 是旧 D601 HWLAB DEV CD 指挥侧 wrapper,仅用于显式 legacy 诊断和迁移对照。默认通过 UniDesk provider `host.ssh` 进入 D601,再调用 HWLAB repo-owned `scripts/dev-cd-apply.mjs`,不内嵌发布 kubectl 逻辑:`status` 汇总固定 CD mirror、Git clean/main/origin-main、`deploy/deploy.json`/artifact catalog/report、D601 native k3s guard 和 CD Lease lock,并用 `scripts/dev-cd-apply.mjs --status --skip-live-verify` 取得 target/promotion 摘要;`audit` 在 k3s/CD 恢复后做只读健康审计,返回有界 JSON 的 blocker 分类、D601 guard/node、SecretRef 存在性、registry 可达性、Lease phase/holder/staleness、deploy.json 与 artifact/workload image 收敛、current Deployment image/revision/rollout、16666/16667 public health commit/readiness 和 DB/runtime durability 摘要;`preflight` 进一步检查必需 SecretRef 对象/键存在性并运行 HWLAB `scripts/dev-cd-apply.mjs --dry-run --skip-live-verify` 受控事务摘要。完整远端 stdout/stderr 写入 D601 `~/.state/unidesk-hwlab-cd/<run-id>/` 和本地 `.state/hwlab-cd/<run-id>/` task dumpstdout 只返回有界摘要。默认 HWLAB CD repo 是 `/home/ubuntu/hwlab_cd``/home/ubuntu/hwlab` runner 历史目录不得作为发布真相。wrapper 强制 `KUBECONFIG=/etc/rancher/k3s/k3s.yaml` 并只以这个显式目标作为 gate;显式目标出现 `docker-desktop``desktop-control-plane``127.0.0.1:11700` 信号会结构化拒绝,audit/preflight/apply --dry-run 都必须观察到 node `d601`。真实 apply 只暴露 `scripts/dev-cd-apply.mjs --apply --confirm-dev --confirmed-non-production --write-report` 命令形状并标注 host-commander-only,本 runner 不执行 live apply、rollout、Lease mutation 或 DEV deploy apply。长期规则见 `docs/reference/hwlab.md`
@@ -88,7 +88,7 @@ CI/CD、GitOps、rollout、artifact 发布、PR 合并后的 DEV/PROD 滚动、P
- `codex interrupt|cancel <taskId>` 通过 Code Queue 私有代理请求中断;running/judging 任务会请求 D601 当前 agent run 停止,queued/retry_wait 任务的取消也必须保持与 WebUI 相同代理路径,返回有界 task 摘要和后续查询命令。任何需要接触 active run 的动作仍属于 D601 执行面。
- Code Queue 多队列 lane 由 `codex` 命令命名空间管理:`queues [--full|--all] [--limit N] [--page N|--offset N]` 列表、`queue create <queueId>` 创建、`queue merge <sourceQueueId> --into <targetQueueId>` 合并、`move <taskId> --queue <queueId>` 迁移;这些队列管理入口默认由主 server `code-queue-mgr` 直管 PostgreSQL,仍通过稳定 `code-queue` 用户服务代理路径访问。`codex queues` 默认只返回 active/nonempty/unread/runnable queue 摘要、activity、commanderConcurrency、全局 counts 和 execution diagnostics`--full``--all` 只切换为完整队列行视图的一页,仍受 `--limit`/`--page`/`--offset` 分页约束,不再默认携带 deprecated full array。summary 和 full 的稳定机读路径都是 `.data.queues.items[]`,全局元数据固定在 `.data.queues.commanderConcurrency``.data.queues.activity``.data.queues.counts``.data.queues.executionDiagnostics``.data.queues.activeTaskIds``.data.queues.queuedTaskIds`;需要完整 upstream 时使用输出中的 raw command。`commanderConcurrency.activeRunnerCount` / `activity.effectiveActiveTaskCount` 是指挥官并发判断的有效活跃数,`schedulerLocalActiveQueueCount`/`activeQueueIds` 只描述本地 scheduler active-run slots,不能覆盖数据库 running 计数或 heartbeat-fresh runner 计数。旧 full 顶层数组语义已作为 deprecated 兼容信息记录,不再作为 `.data.queues` 主形态。同一个 queue 内部串行执行,不同 queue 之间并行执行。迁移只允许尚未被 scheduler claim 的 `queued`/`retry_wait` 任务,必须满足 `startedAt=null``currentAttempt=0` 且没有 active thread/turn;已进入 `running`/`judging` 或已有 claim 标记的任务返回 409,不得被 move/merge 回写成 queued。合并会移动可迁移任务归属并自动删除源 queue 记录,只保留合并后的目标 queue;若 source 或 target queue 存在 active/claimed 任务,合并整体返回 409。合并后的目标 queue 按任务原 `queueEnteredAt`/`createdAt` 时间顺序串行,成功迁移 queued/retry_wait 任务后由 D601 scheduler 轮询推进。
- 所有 `codex` 查询和管理命令必须走与 WebUI 相同的 backend-core 私有代理路径 `/api/microservices/code-queue/proxy/...`;CLI 不得为了提交、移动、中断、取消或队列管理直接调用 D601 内部 Service、数据库、pod curl 或 k3sctl scheduler 子服务。若该路径失败,应先修复 CLI/backend/provider tunnel 链路,而不是绕过控制面。
- `job list [--limit N] [--include-command]``job status <jobId|latest> [--tail-bytes N]` 查询 `.state/jobs/` 文件系统状态,是异步命令的可观测入口。`job list` 默认只返回最新 50 条摘要,并为已知异步工作流返回轻量 `progress.summary` 与后续查询命令;`job status` 默认返回结构化 `progress`、stdout/stderr 末尾 12000 字节、`tailPolicy` 与完整日志路径。已知工作流应从有界日志尾部抽取阶段、关键对象名和下一步命令,避免为了判断当前阶段而手工打开完整 stdout/stderr。
- `job list [--limit N] [--include-command]``job status <jobId|latest> [--tail-bytes N]` 查询 `.state/jobs/` 文件系统状态,是异步命令的可观测入口。`job list` 默认只返回最新 50 条摘要,并为已知异步工作流返回轻量 `progress.summary` 与后续查询命令;`job status` 默认返回结构化 `progress`、stdout/stderr 末尾 12000 字节、`tailPolicy` 与完整日志路径。已知工作流应从有界日志尾部抽取阶段、关键对象名和下一步命令,避免为了判断当前阶段而手工打开完整 stdout/stderr。`hwlab_g14_v02_trigger_current` 的 progress 必须暴露 trigger 阶段、source commit 和 PipelineRun`hwlab_g14_git_mirror_sync|flush` 的 progress 必须暴露 sync/flush 状态、Job 名、pendingFlush 与 fetch/push/total/SSH timing。
- `debug health``debug dispatch``debug task` 走真实内部 core、WebSocket、数据库、provider、系统指标、Docker 状态和 Host SSH 维护桥流程,只用于开发调试,不写入 `TEST.md` 的正式验收步骤。
- `e2e run [--only pattern[,pattern...]] [--skip pattern[,pattern...]]` 使用 publicHost 派生的公开 production frontend/dev frontend/provider ingress URL,并通过 Docker 内网验证 core API、PostgreSQL、provider self-connection、系统指标曲线、Docker 状态快照、provider.upgrade 预检和 Playwright 前端页面,是交付前的自动化 E2E 门禁;CLI 默认输出 check 状态摘要,完整诊断写入 `resultPath`,日常迭代应优先用 `--only` / `--skip` 跑最小必要集合。
@@ -159,7 +159,7 @@ CLI 会在 Host route、workspace route、k3s 控制面脚本和 pod route 的 s
`tran` 不做本地 provider/plane 串行锁;本地目录锁不是 G14 原生 k3s/Tekton/GitOps 的业务协调机制,stale lock 会阻塞所有后续短查询。以后不要在 `tran` wrapper 里恢复本地锁。业务并发、发布互斥和 rollout 协调必须交给 k8s/Tekton/Argo/Lease 等原生运行面机制;若 provider session allocator 需要限流,应在服务端实现带 TTL 的队列或 lease,而不是在客户端加目录锁。
非交互 `tran`/`ssh` 有最外层运行时硬超时,默认和最大值都是 60 秒;`UNIDESK_TRAN_RUNTIME_TIMEOUT_SECONDS``UNIDESK_TRAN_RUNTIME_TIMEOUT_MS``UNIDESK_SSH_RUNTIME_TIMEOUT_MS` 只能把超时调小,不能调大超过 60 秒。到点后 wrapper、backend-core broker 或 frontend websocket 路径会主动断开并在 stderr 输出 `UNIDESK_TRAN_TIMEOUT_HINT``UNIDESK_SSH_RUNTIME_TIMEOUT`,提示改用短查询加轮询。长时间 CI/CD、Tekton/Argo 观察、trace/result、日志 tail、构建下载和硬件任务都必须按 submit-and-poll/短查询语义拆成多次 `tran` 调用;不得让单个 `tran` 挂着等待最终完成。
非交互 `tran`/`ssh` 有最外层运行时硬超时,默认和最大值都是 60 秒;`UNIDESK_TRAN_RUNTIME_TIMEOUT_SECONDS``UNIDESK_TRAN_RUNTIME_TIMEOUT_MS``UNIDESK_SSH_RUNTIME_TIMEOUT_MS` 只能把超时调小,不能调大超过 60 秒。到点后 wrapper、backend-core broker 或 frontend websocket 路径会主动断开并在 stderr 输出 `UNIDESK_TRAN_TIMEOUT_HINT``UNIDESK_SSH_RUNTIME_TIMEOUT`,提示改用短查询加轮询。长时间 CI/CD、Tekton/Argo 观察、trace/result、日志 tail、构建下载和硬件任务都必须按 submit-and-poll/短查询语义拆成多次 `tran` 调用;不得让单个 `tran` 挂着等待最终完成。本规则的跟踪 issue 是 [pikasTech/unidesk#187](https://github.com/pikasTech/unidesk/issues/187)。
长任务的标准形态是:用一次短 `tran <route> script` 启动目标侧 job、后台进程、PipelineRun 或受控命令,并把 job id、日志路径、状态文件或 Kubernetes 对象名写到目标侧;后续用多次短 `tran` 查询 `status``logs --tail``kubectl get/describe`、trace result 或工具自带 job-status。不要为了观察 Docker build、镜像 push、Keil 下载、串口抓取或 Code Agent turn,把 `tran` 一直挂到结束;超过 60 秒的断开只说明调用方式需要改成轮询,不应立即归因成 provider session、CI/CD 或硬件失败。
+4
View File
@@ -67,6 +67,10 @@ The `v0.2` CI/CD integration must be additive: add a `v0.2` source watcher, GitO
The `devops-infra` git mirror remains manual and CLI-controlled, not CronJob-driven. The standard `v0.2` CI/CD trigger is `bun scripts/cli.ts hwlab g14 control-plane trigger-current --lane v02 --confirm`; this command must check the mirror's `localV02` ref before creating the PipelineRun, run one bounded manual `git-mirror sync` Job when the mirror is stale, and only continue after the mirror ref matches the current source commit. Use `hwlab g14 git-mirror sync --confirm` directly only for explicit mirror maintenance or diagnosis.
After a `v0.2` PipelineRun completes, treat runtime rollout and remote GitOps persistence as two separate checks. `hwlab g14 control-plane status --lane v02` is the runtime check: it must show the expected source commit, PipelineRun completed, Argo `Synced/Healthy`, public 19666/19667 probes passing, and Cloud Web asset probes such as `/app.js` readable. `hwlab g14 git-mirror status` is the persistence check: `cache.summary.pendingFlush` must be false and `cache.summary.githubInSync` true before declaring GitOps fully flushed back to GitHub. If runtime is healthy but `pendingFlush=true`, run `bun scripts/cli.ts hwlab g14 git-mirror flush --confirm` and poll the returned job with `bun scripts/cli.ts job status <jobId> --tail-bytes 12000`; do not replace this with raw `kubectl`, native `git push`, or a long SSH wait.
`/health/live` revision is owned by `hwlab-cloud-api`; it can legitimately differ from the source commit for a Cloud Web-only change. Do not call that difference a failed Cloud Web rollout when `webAssets.checks.htmlOk`, `webAssets.checks.appJsOk`, CSS probes, Argo health, and `hwlab-cloud-web` Deployment readiness have passed. For Cloud Web behavior changes, the public JS asset probe or a bounded browser/DOM check is stronger evidence than cloud-api `apiRevision`.
Do not turn `v0.2` expansion governance into a stack of broad compatibility gates. The stable control points are branch, GitOps branch, namespace, runtime path, Argo Application, FRP ports and generated-output ownership. Legacy DEV/D601/main preflights that block the `v0.2` lane should be removed from that lane, not patched with fallback or legacy modes. Naming, RBAC scope, cleanup policy, resource quota and rollback order are design decisions or runbook entries unless they protect a concrete high-value risk that cannot be enforced by the fixed boundaries above.
## Node-Local VPN Proxy
+9 -1
View File
@@ -1,4 +1,4 @@
import { gitMirrorFlushJobManifest, gitMirrorSyncJobManifest, gitMirrorV02SyncRequirement, hwlabG14MonitorStateFileName, parseGitMirrorStatusRefs, parsePipelineTaskRunMetrics, rolloutRecordBody, semanticChangelogBullets, v02ControlPlaneRenderScript, v02PipelineServiceIds } from "./src/hwlab-g14";
import { gitMirrorFlushJobManifest, gitMirrorStatusSummary, gitMirrorSyncJobManifest, gitMirrorV02SyncRequirement, hwlabG14MonitorStateFileName, parseGitMirrorStatusRefs, parsePipelineTaskRunMetrics, rolloutRecordBody, semanticChangelogBullets, v02ControlPlaneRenderScript, v02PipelineServiceIds } from "./src/hwlab-g14";
function assertCondition(condition: unknown, message: string, detail: unknown = {}): void {
if (!condition) throw new Error(`${message}: ${JSON.stringify(detail)}`);
@@ -61,6 +61,13 @@ assertCondition(
gitMirrorV02SyncRequirement("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa", gitMirrorStatusRaw).required === true,
"trigger-current must sync mirror before creating PipelineRun when local v0.2 is stale",
);
const gitMirrorSummary = gitMirrorStatusSummary(gitMirrorStatusRaw);
assertCondition(
gitMirrorSummary.flushNeeded === true && gitMirrorSummary.flushCommand === "bun scripts/cli.ts hwlab g14 git-mirror flush --confirm",
"git mirror status summary must expose pending flush and the exact controlled flush command",
gitMirrorSummary,
);
assertCondition(gitMirrorSummary.githubInSync === false, "git mirror status summary must expose GitHub GitOps drift", gitMirrorSummary);
const renderScript = v02ControlPlaneRenderScript("aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa");
assertCondition(
renderScript.includes("git worktree add --detach") && renderScript.includes("/tmp/hwlab-v02-control-plane-source-aaaaaaaaaaaa"),
@@ -184,6 +191,7 @@ console.log(JSON.stringify({
"git mirror sync is a manual devops-infra Job, not a CronJob",
"git mirror flush is a manual devops-infra Job, not a CronJob",
"trigger-current can decide whether v0.2 git mirror pre-sync is required",
"git mirror status exposes pending flush and controlled flush command",
"v0.2 control-plane render uses a detached temp worktree",
"v0.2 PipelineRun service matrix excludes hwlab-cli",
"rollout brief includes natural-language changelog before automatic diff summary",
+60 -2
View File
@@ -744,6 +744,10 @@ function v02WebAssetsProbeScript(): string {
"css=$(fetch_url \"$base/styles.css\" 2>/dev/null)",
"css_code=$?",
"printf 'cssOk\\t%s\\n' \"$css_code\"",
"app_js=$(fetch_url \"$base/app.js\" 2>/dev/null)",
"app_js_code=$?",
"printf 'appJsOk\\t%s\\n' \"$app_js_code\"",
"printf 'appJsBytes\\t%s\\n' \"$(printf '%s' \"$app_js\" | wc -c | tr -d ' ')\"",
"printf 'sidebarFitCss\\t%s\\n' \"$(printf '%s' \"$css\" | grep -Eq 'grid-template-rows:[[:space:]]*auto[[:space:]]+auto[[:space:]]+auto[[:space:]]+auto[[:space:]]+minmax\\(0,[[:space:]]*1fr\\)'; printf '%s' \"$?\")\"",
"printf 'workspaceFitCss\\t%s\\n' \"$(printf '%s' \"$css\" | grep -Eq 'grid-template-rows:[[:space:]]*auto[[:space:]]+minmax\\(0,[[:space:]]*1fr\\)'; printf '%s' \"$?\")\"",
"printf 'eventPanelFitCss\\t%s\\n' \"$(printf '%s' \"$css\" | grep -Eq 'grid-template-rows:[[:space:]]*auto[[:space:]]+minmax\\(132px,[[:space:]]*1fr\\)'; printf '%s' \"$?\")\"",
@@ -812,16 +816,18 @@ function v02WebAssetsFromText(
}
const htmlOk = fields.htmlOk === "0";
const cssOk = fields.cssOk === "0";
const appJsOk = fields.appJsOk === "0";
const apiHealthOk = fields.apiHealthOk === "0";
const readonlyNoteAbsent = fields.readonlyNote === "1";
const sidebarFitCss = fields.sidebarFitCss === "0";
const workspaceFitCss = fields.workspaceFitCss === "0";
const eventPanelFitCss = fields.eventPanelFitCss === "0";
const apiRevision = fields.apiRevision || null;
const webChecksPass = htmlOk && cssOk && readonlyNoteAbsent && sidebarFitCss && workspaceFitCss && eventPanelFitCss && apiHealthOk;
const webChecksPass = htmlOk && cssOk && appJsOk && readonlyNoteAbsent && sidebarFitCss && workspaceFitCss && eventPanelFitCss && apiHealthOk;
const failedChecks = Object.entries({
htmlOk,
cssOk,
appJsOk,
readonlyNoteAbsent,
sidebarFitCss,
workspaceFitCss,
@@ -838,6 +844,7 @@ function v02WebAssetsFromText(
checks: {
htmlOk,
cssOk,
appJsOk,
readonlyNoteAbsent,
sidebarFitCss,
workspaceFitCss,
@@ -847,12 +854,16 @@ function v02WebAssetsFromText(
probeExitCodes: {
html: numericField(fields.htmlOk),
css: numericField(fields.cssOk),
appJs: numericField(fields.appJsOk),
readonlyNoteGrep: numericField(fields.readonlyNote),
sidebarFitCssGrep: numericField(fields.sidebarFitCss),
workspaceFitCssGrep: numericField(fields.workspaceFitCss),
eventPanelFitCssGrep: numericField(fields.eventPanelFitCss),
apiHealth: numericField(fields.apiHealthOk),
},
assetBytes: {
appJs: numericField(fields.appJsBytes),
},
apiRevision,
note: webAssetsRevisionNote(apiRevision, sourceCommit, activePipelineRuns),
exitCode,
@@ -870,7 +881,7 @@ function webAssetsRevisionNote(apiRevision: string | null, sourceCommit: string
if (activeItems.length > 0) {
return `runtime apiRevision=${apiRevision} differs from sourceCommit=${sourceCommit}; active v02 PipelineRun observed, wait for rollout before judging runtime revision`;
}
return `runtime revision lag: cloud-api apiRevision=${apiRevision} differs from sourceCommit=${sourceCommit}; inspect recentPipelineRuns, argo, and runtime-ready state`;
return `cloud-api apiRevision=${apiRevision} is service-scoped and differs from sourceCommit=${sourceCommit}; when web asset probes pass, this is not proof that Cloud Web is stale`;
}
function numericField(value: string | undefined): number | null {
@@ -1716,6 +1727,52 @@ export function parseGitMirrorStatusRefs(raw: string): { refs: Record<string, st
}
}
function parseLabeledJsonLine(raw: string, label: string): Record<string, unknown> {
const line = raw.split(/\r?\n/u).find((item) => item.startsWith(`${label}=`));
if (line === undefined) return {};
const value = line.slice(label.length + 1).trim();
if (value.length === 0) return {};
try {
return record(JSON.parse(value) as unknown);
} catch {
return {};
}
}
export function gitMirrorStatusSummary(raw: string): Record<string, unknown> {
const refs = parseGitMirrorStatusRefs(raw);
const lastSync = parseLabeledJsonLine(raw, "lastSync");
const lastWrite = parseLabeledJsonLine(raw, "lastWrite");
const lastFlush = parseLabeledJsonLine(raw, "lastFlush");
const localGitops = refs.refs.localGitops ?? null;
const githubGitops = refs.refs.githubGitops ?? null;
return {
localV02: refs.refs.localV02 ?? null,
localG14: refs.refs.localG14 ?? null,
localGitops,
githubGitops,
pendingFlush: refs.pendingFlush,
lastSync: {
status: stringOrNull(lastSync.status) ?? null,
at: stringOrNull(lastSync.publishedAt) ?? stringOrNull(lastSync.syncedAt) ?? null,
pendingFlush: typeof lastSync.pendingFlush === "boolean" ? lastSync.pendingFlush : null,
},
lastWrite: {
status: stringOrNull(lastWrite.status) ?? null,
at: stringOrNull(lastWrite.writtenAt) ?? null,
pendingFlush: typeof lastWrite.pendingFlush === "boolean" ? lastWrite.pendingFlush : null,
},
lastFlush: {
status: stringOrNull(lastFlush.status) ?? null,
at: stringOrNull(lastFlush.flushedAt) ?? null,
pendingFlush: typeof lastFlush.pendingFlush === "boolean" ? lastFlush.pendingFlush : null,
},
flushNeeded: refs.pendingFlush === true,
flushCommand: refs.pendingFlush === true ? "bun scripts/cli.ts hwlab g14 git-mirror flush --confirm" : null,
githubInSync: Boolean(localGitops && githubGitops && localGitops === githubGitops),
};
}
export function gitMirrorV02SyncRequirement(sourceCommit: string, rawStatus: string): Record<string, unknown> {
const parsed = parseGitMirrorStatusRefs(rawStatus);
const localV02 = parsed.refs.localV02 ?? null;
@@ -1937,6 +1994,7 @@ function runGitMirrorStatus(): Record<string, unknown> {
},
cache: {
ok: shellSectionOk(cache),
summary: gitMirrorStatusSummary(String(cache?.stdout ?? "").trim()),
raw: String(cache?.stdout ?? "").trim(),
stderr: shellSectionOk(cache) ? "" : bundleStderr,
},
+73 -8
View File
@@ -25,7 +25,7 @@ export interface JobRecord {
}
export interface JobProgressSummary {
kind: "hwlab-v02-trigger" | "generic";
kind: "hwlab-v02-trigger" | "hwlab-git-mirror" | "generic";
stage: string | null;
stageStatus: string | null;
sourceCommit: string | null;
@@ -196,11 +196,13 @@ export function jobWithTail(job: JobRecord, maxBytes = 12000): JobRecord & {
function summarizeJobProgress(job: JobRecord, maxBytes = 96_000, tails?: { stdoutTail: string; stderrTail: string }): JobProgressSummary {
const knownWorkflow = job.name === "hwlab_g14_v02_trigger_current";
if (!knownWorkflow && tails === undefined) return genericJobProgress(job);
const gitMirrorWorkflow = job.name === "hwlab_g14_git_mirror_sync" || job.name === "hwlab_g14_git_mirror_flush";
if (!knownWorkflow && !gitMirrorWorkflow && tails === undefined) return genericJobProgress(job);
const nowMs = Date.now();
const progressTailBytes = Math.max(4096, Math.floor(maxBytes));
const stderrTail = tails?.stderrTail ?? tailFile(job.stderrFile, progressTailBytes);
const stdoutTail = tails?.stdoutTail ?? tailFile(job.stdoutFile, progressTailBytes);
if (gitMirrorWorkflow) return summarizeGitMirrorJobProgress(job, stdoutTail, stderrTail, nowMs);
const events = parseJsonLineEvents(stderrTail, "hwlab.v02.trigger.progress");
const lastEvent = events.at(-1) ?? {};
const stage = stringField(lastEvent.stage);
@@ -266,6 +268,57 @@ function summarizeJobProgress(job: JobRecord, maxBytes = 96_000, tails?: { stdou
};
}
function summarizeGitMirrorJobProgress(job: JobRecord, stdoutTail: string, stderrTail: string, nowMs = Date.now()): JobProgressSummary {
const action = job.name.endsWith("_flush") ? "flush" : job.name.endsWith("_sync") ? "sync" : "unknown";
const elapsedSeconds = jobElapsedSeconds(job, nowMs);
const timings = extractGitMirrorTimings(`${stderrTail}\n${stdoutTail}`);
const status =
firstMatch(stdoutTail, new RegExp(`"event"\\s*:\\s*"git-mirror-${action}"[\\s\\S]{0,240}?"status"\\s*:\\s*"([^"]+)"`, "u")) ??
firstMatch(stdoutTail.replaceAll('\\"', '"').replaceAll("\\n", "\n"), new RegExp(`"event"\\s*:\\s*"git-mirror-${action}"[\\s\\S]{0,240}?"status"\\s*:\\s*"([^"]+)"`, "u"));
const pendingFlush = firstMatch(stdoutTail.replaceAll('\\"', '"'), /"pendingFlush"\s*:\s*(true|false)/u);
const jobName = firstMatch(stdoutTail, /"jobName"\s*:\s*"([^"]+)"/u);
const warnings = jobProgressWarnings({
job,
eventsObserved: 0,
elapsedSeconds,
stage: action,
stageStatus: job.status === "running" ? "running" : status,
stageElapsedSeconds: job.status === "running" ? elapsedSeconds : null,
lastEventAgeSeconds: null,
});
const timingSummary = Object.entries(timings)
.filter(([key]) => key.startsWith(`gitMirror.${action}.`) || key === "sshRuntimeMaxMs")
.map(([key, value]) => `${key}=${value}ms`);
return {
kind: "hwlab-git-mirror",
stage: action,
stageStatus: status ?? (job.status === "running" ? "running" : null),
sourceCommit: null,
pipelineRun: null,
pipelineCreated: null,
elapsedSeconds,
stageElapsedSeconds: job.status === "running" ? elapsedSeconds : null,
lastEventAt: null,
lastEventAgeSeconds: null,
eventsObserved: 0,
slow: warnings.length > 0,
warnings,
timings,
summary: [
job.status,
`git-mirror-${action}${status ? `:${status}` : ""}`,
jobName ? `job=${jobName}` : null,
pendingFlush !== null ? `pendingFlush=${pendingFlush}` : null,
elapsedSeconds !== null ? `elapsed=${elapsedSeconds}s` : null,
...timingSummary,
warnings.length > 0 ? "visibility-warning" : null,
].filter(Boolean).join(" "),
nextCommand: job.status === "running"
? `bun scripts/cli.ts job status ${job.id} --tail-bytes 12000`
: "bun scripts/cli.ts hwlab g14 git-mirror status",
};
}
function genericJobProgress(job: JobRecord): JobProgressSummary {
const nowMs = Date.now();
const elapsedSeconds = jobElapsedSeconds(job, nowMs);
@@ -401,10 +454,18 @@ function extractGitMirrorTimingJsonLines(text: string, timings: Record<string, n
const parsed = JSON.parse(trimmed) as unknown;
if (typeof parsed !== "object" || parsed === null || Array.isArray(parsed)) continue;
const event = parsed as Record<string, unknown>;
if (event.event !== "git-mirror-sync-timing") continue;
const operation = event.event === "git-mirror-sync-timing"
? "sync"
: event.event === "git-mirror-flush-timing"
? "flush"
: null;
if (operation === null) continue;
const phase = stringField(event.phase);
const durationMs = typeof event.durationMs === "number" && Number.isFinite(event.durationMs) ? event.durationMs : null;
if (phase !== null && durationMs !== null) timings[`gitMirror.${phase}Ms`] = Math.round(durationMs);
if (phase !== null && durationMs !== null) {
timings[`gitMirror.${operation}.${phase}Ms`] = Math.round(durationMs);
if (operation === "sync") timings[`gitMirror.${phase}Ms`] = Math.round(durationMs);
}
} catch {
// Ignore non-JSON lines; raw logs remain available through job status tails.
}
@@ -412,11 +473,15 @@ function extractGitMirrorTimingJsonLines(text: string, timings: Record<string, n
}
function extractGitMirrorTimingRegex(text: string, timings: Record<string, number>): void {
const pattern = /"event"\s*:\s*"git-mirror-sync-timing"[\s\S]{0,240}?"phase"\s*:\s*"([^"]+)"[\s\S]{0,240}?"durationMs"\s*:\s*(\d+)/gu;
const pattern = /"event"\s*:\s*"git-mirror-(sync|flush)-timing"[\s\S]{0,240}?"phase"\s*:\s*"([^"]+)"[\s\S]{0,240}?"durationMs"\s*:\s*(\d+)/gu;
for (const match of text.matchAll(pattern)) {
const phase = match[1];
const durationMs = Number(match[2]);
if (phase !== undefined && Number.isFinite(durationMs)) timings[`gitMirror.${phase}Ms`] = Math.round(durationMs);
const operation = match[1];
const phase = match[2];
const durationMs = Number(match[3]);
if (operation !== undefined && phase !== undefined && Number.isFinite(durationMs)) {
timings[`gitMirror.${operation}.${phase}Ms`] = Math.round(durationMs);
if (operation === "sync") timings[`gitMirror.${phase}Ms`] = Math.round(durationMs);
}
}
}