From bfdfd109231b5b997c948ea338dc6cb0e2051cb5 Mon Sep 17 00:00:00 2001 From: Codex Date: Sun, 24 May 2026 05:59:29 +0000 Subject: [PATCH] fix: harden hwlab cd cache --- docs/reference/hwlab.md | 6 +- scripts/src/hwlab-cd-remote-runner.cjs | 625 ++++++++++++++++++++++++- scripts/src/hwlab-cd.ts | 346 ++++++++++++-- 3 files changed, 932 insertions(+), 45 deletions(-) diff --git a/docs/reference/hwlab.md b/docs/reference/hwlab.md index a3a8c769..fcbfacd3 100644 --- a/docs/reference/hwlab.md +++ b/docs/reference/hwlab.md @@ -51,9 +51,11 @@ bun scripts/cli.ts hwlab cd preflight --env dev bun scripts/cli.ts hwlab cd apply --env dev --dry-run ``` -wrapper 的职责是把 host commander 常用的 HWLAB DEV rollout 查看/准备动作收敛到单一入口。默认路径通过 UniDesk main-server frontend/backend-core 的 provider `host.ssh` 能力进入 D601,再在 D601 上执行有界检查脚本;它只调用 HWLAB repo-owned 受控脚本,不在 UniDesk 内重写发布流程或拼接 ad hoc `kubectl apply`: +wrapper 的职责是把 host commander 常用的 HWLAB DEV rollout 查看/准备动作收敛到单一入口。默认路径通过 UniDesk main-server frontend/backend-core 的 provider `host.ssh` 能力进入 D601;frontend transport 先用多个短 `host.ssh` 命令把远端 runner 分块上传到 `/tmp/unidesk-hwlab-cd/`,再在 D601 上执行有界检查脚本,避免 provider-gateway 的短命令长度限制成为审计入口单点故障。经 frontend/backend-core 返回的 stdout 只允许承载短 JSON 摘要;完整结果必须写入 D601 `~/.state/unidesk-hwlab-cd//result.full.json`,防止 backend JSON 安全摘要把 stdout 截断后导致 CLI 失明。它只调用 HWLAB repo-owned 受控脚本,不在 UniDesk 内重写发布流程或拼接 ad hoc `kubectl apply`: -- 默认 HWLAB CD repo 是 D601 固定干净 mirror `/home/ubuntu/hwlab_cd`,也可用 `--hwlab-repo` 显式指定同等干净 clone。wrapper 必须检查 `git status --short --branch`、origin remote、当前 branch `main`、本地 `origin/main`、`FETCH_HEAD` 和 worktree 权限;任何 dirty worktree、错误 remote、非 main、HEAD 未跟上本地 `origin/main` 或权限异常都返回结构化 blocker。`/home/ubuntu/hwlab` 是 runner 历史目录,不得作为发布真相。 +- 默认 HWLAB CD repo 是每次运行在 D601 `~/.state/unidesk-hwlab-cd//ephemeral-repo/HWLAB` 新建的一次性 clone。一次性 repo 只从专用 full bare cache `~/.cache/unidesk/hwlab-cd/git-cache/HWLAB.git` 拷贝生成,必须使用 `--no-hardlinks` 形成独立 `.git` 对象库,不能通过 `--shared` 或 alternates 在运行时依赖 cache 对象库。cache 本身只允许作为 HWLAB CD 专用 Git 加速源,不承载 runner workspace、部署副本、手工改代码、报告留痕或任何其他用途。 +- 专用 cache 由当前 `host.ssh` 执行用户拥有,`~/.cache/unidesk/hwlab-cd` 必须保持 owner-only 权限;owner 不匹配、不可写或权限收紧失败时返回结构化 blocker。cache 初始化可以从本机已有 HWLAB clone seed 一次性拷贝,但 seed 只能用于降低首次出网成本,不能作为 release truth。cache remote 固定优先 `git@github.com:pikasTech/HWLAB.git`,GitHub 出网通过 D601 provider egress proxy `127.0.0.1:18789` 和 SSH `ProxyCommand`。cache 必须保存 full bare repo 历史,而不是 depth=1 浅缓存;wrapper 需要能解析 `refs/heads/main:deploy/deploy.json` 指向的 promotion commit,浅缓存即使 main HEAD 一致也不能证明 CD desired-state 完整。若 GitHub refresh 失败但本地 full cache 同时满足 main HEAD 和 deploy promotion commit,可作为 stale-cache 诊断继续读数,但必须暴露 `hwlab-cache-refresh` blocker,不能伪装成 release-truth PASS。 +- 一次性 repo 是当前 `host.ssh` 执行用户创建的临时发布读数,避免长期 `/home/ubuntu/hwlab_cd` 被 root 或其他 runner owner 污染后成为 CD 单点故障。`--hwlab-repo` 或 `UNIDESK_HWLAB_REPO` 仍可显式指定同等干净 clone 供人工诊断;被指定的 clone 必须由执行用户拥有并可读写。root-owned 或其他用户拥有的 clone 会触发 Git dubious ownership、`FETCH_HEAD` 不可写和 `.git/worktrees` 权限 blocker。修复 owner 污染应重建或显式修正 mirror 权限,不能用 `git config --global safe.directory` 掩盖发布真相污染。wrapper 必须检查 `git status --short --branch`、origin remote、当前 branch `main`、本地 `origin/main`、`FETCH_HEAD` 和 worktree 权限;任何 dirty worktree、错误 remote、非 main、HEAD 未跟上本地 `origin/main` 或权限异常都返回结构化 blocker。`/home/ubuntu/hwlab` 是 runner 历史目录,不得作为发布真相。 - `deploy/deploy.json` 是唯一 desired-state。wrapper 只把 `deploy/artifact-catalog.dev.json`、`deploy/k8s/base/workloads.yaml` 和 `reports/dev-gate/dev-artifacts.json` 当作派生/证据读数;`status`/`preflight` 必须显示 target commit/ref、deploy.json、artifact catalog、workloads 和 live workload image 是否同源/收敛,不引入第二套 desired state。 - `status` 只读汇总 HWLAB repo path、Git clean/main/origin-main、desired-state 收敛、D601 native k3s guard 和 `Lease/hwlab-dev/hwlab-dev-cd-lock`;同时调用 HWLAB `scripts/dev-cd-apply.mjs --status --skip-live-verify` 取得 repo-owned target/promotion/deploy.json/artifact 摘要。16666/16667 live verification 不由本 runner 执行。 - `audit` 是 DEV CD 恢复后的只读健康审计,不是验收 gate 或报告生成器。它在 `status` 受控路径上补充只读 `kubectl get`/HTTP health probes,输出有界 JSON summary,分类 `control-plane-unavailable`、`docker-desktop-context-risk`、`second-control-plane-risk`、`workspace-unavailable`、`dirty-worktree`、`secret-missing`、`registry-unavailable`、`lease-held`、`lease-stale-candidate`、`artifact-missing`、`artifact-mismatch`、`runtime-job-blocked`、`rollout-unhealthy`、`public-tunnel-unhealthy` 和 `db-runtime-durability-risk`。audit 只显示 Secret 对象/key 是否存在,不显示值;只读判断 Lease 是否 stale,不释放或 break;只读拉取 16666/16667 `/health/live` 的 commit/readiness 摘要,不把它当作 M3 DEV-LIVE 验收。 diff --git a/scripts/src/hwlab-cd-remote-runner.cjs b/scripts/src/hwlab-cd-remote-runner.cjs index 4fb409f2..29d926e7 100644 --- a/scripts/src/hwlab-cd-remote-runner.cjs +++ b/scripts/src/hwlab-cd-remote-runner.cjs @@ -8,8 +8,14 @@ const path = require("node:path"); const namespace = "hwlab-dev"; const lockName = "hwlab-dev-cd-lock"; const nativeKubeconfig = "/etc/rancher/k3s/k3s.yaml"; -const defaultRepo = "/home/ubuntu/hwlab_cd"; +const legacyDefaultRepo = "/home/ubuntu/hwlab_cd"; const rejectedRepo = "/home/ubuntu/hwlab"; +const defaultRemoteCandidates = ["git@github.com:pikasTech/HWLAB.git"]; +const dedicatedCacheRoot = path.join(os.homedir(), ".cache", "unidesk", "hwlab-cd"); +const dedicatedCacheRepo = path.join(os.homedir(), ".cache", "unidesk", "hwlab-cd", "git-cache", "HWLAB.git"); +const defaultEgressProxy = "http://127.0.0.1:18789"; +const defaultNoProxy = "localhost,127.0.0.1,::1,host.docker.internal,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local"; +const seedCacheRepos = ["/home/ubuntu/hwlab", "/home/ubuntu/workspace/hwlab", "/tmp/hwlab-dev-cd-35bbbee-host"]; const requiredNodeName = "d601"; const tailChars = 1400; const parseCaptureLimitBytes = 1024 * 1024; @@ -169,25 +175,279 @@ function accessCheck(targetPath, mode) { } } -function resolveRepo(provided) { - const rawPath = provided || process.env.UNIDESK_HWLAB_REPO || defaultRepo; +function repoRecord(rawPath, source, extra = {}) { const absolutePath = path.resolve(rawPath); const rejected = absolutePath === rejectedRepo; const devCdApply = path.join(absolutePath, "scripts/dev-cd-apply.mjs"); const deployJson = path.join(absolutePath, "deploy/deploy.json"); return { status: !rejected && fs.existsSync(absolutePath) && fs.existsSync(devCdApply) && fs.existsSync(deployJson) ? "selected" : "blocked", - source: provided ? "option" : process.env.UNIDESK_HWLAB_REPO ? "env:UNIDESK_HWLAB_REPO" : "default:d601-clean-mirror", + source, path: absolutePath, - defaultPath: defaultRepo, + defaultPath: legacyDefaultRepo, + defaultMode: "ephemeral-clone", rejected, rejectionReason: rejected ? "runner-history-directory-is-not-hwlab-cd-release-truth" : null, exists: fs.existsSync(absolutePath), hasDevCdApply: fs.existsSync(devCdApply), hasDeployJson: fs.existsSync(deployJson), + ...extra, }; } +function remoteCandidates() { + const fromEnv = process.env.UNIDESK_HWLAB_REMOTE_URL; + if (typeof fromEnv === "string" && fromEnv.trim().length > 0) return [fromEnv.trim()]; + return defaultRemoteCandidates; +} + +function seedRepoCandidates() { + const fromEnv = process.env.UNIDESK_HWLAB_CACHE_SEED_REPOS; + const configured = typeof fromEnv === "string" && fromEnv.trim().length > 0 + ? fromEnv.split(":").map((item) => item.trim()).filter(Boolean) + : seedCacheRepos; + return [...new Set(configured.map((item) => path.resolve(item)))]; +} + +function egressEnv() { + const proxy = process.env.UNIDESK_HWLAB_EGRESS_PROXY || defaultEgressProxy; + const noProxy = process.env.NO_PROXY || process.env.no_proxy || defaultNoProxy; + const proxyHostPort = proxy.replace(/^https?:\/\//u, ""); + const gitSshCommand = process.env.GIT_SSH_COMMAND || [ + "ssh", + "-o", "BatchMode=yes", + "-o", "StrictHostKeyChecking=accept-new", + "-o", `"ProxyCommand=nc -X connect -x ${proxyHostPort} %h %p"`, + ].join(" "); + return { + ...process.env, + HTTP_PROXY: proxy, + HTTPS_PROXY: proxy, + ALL_PROXY: proxy, + http_proxy: proxy, + https_proxy: proxy, + all_proxy: proxy, + NO_PROXY: noProxy, + no_proxy: noProxy, + GIT_SSH_COMMAND: gitSshCommand, + }; +} + +async function egressProxyHealth(dumpDir, timeoutMs) { + const proxy = process.env.UNIDESK_HWLAB_EGRESS_PROXY || defaultEgressProxy; + const healthUrl = `${proxy.replace(/\/$/u, "")}/__unidesk/egress-proxy/health`; + const command = await runCaptured(["curl", "-fsS", "--max-time", "3", healthUrl], os.homedir(), dumpDir, "egress-proxy-health", { env: egressEnv(), timeoutMs: Math.min(timeoutMs, 5000) }); + const body = parseJson(command.stdoutText); + return { + status: command.ok && asRecord(body)?.connected === true ? "pass" : "blocked", + proxy, + healthUrl, + connected: asRecord(body)?.connected ?? null, + command: commandView(command), + }; +} + +function pathOwner(targetPath) { + try { + const stat = fs.statSync(targetPath); + return { uid: stat.uid, gid: stat.gid, mode: (stat.mode & 0o777).toString(8), isDirectory: stat.isDirectory() }; + } catch (error) { + return { uid: null, gid: null, mode: null, isDirectory: false, error: error instanceof Error ? error.message : String(error) }; + } +} + +function cacheRepoRecord(cachePath) { + const absolutePath = path.resolve(cachePath); + const gitPath = path.join(absolutePath, ".git"); + const bareHead = path.join(absolutePath, "HEAD"); + const bareObjects = path.join(absolutePath, "objects"); + const shallowPath = path.join(absolutePath, "shallow"); + const exists = fs.existsSync(gitPath) || (fs.existsSync(bareHead) && fs.existsSync(bareObjects)); + const readable = accessCheck(absolutePath, fs.constants.R_OK | fs.constants.X_OK); + const writable = accessCheck(absolutePath, fs.constants.R_OK | fs.constants.W_OK | fs.constants.X_OK); + const owner = pathOwner(absolutePath); + return { path: absolutePath, exists, readable: readable.ok, writable: writable.ok, owner, shallow: fs.existsSync(shallowPath), error: readable.error || writable.error || null }; +} + +async function deployJsonCommitFromCache(dumpDir, timeoutMs) { + const command = await runCaptured(["git", "--git-dir", dedicatedCacheRepo, "show", "refs/heads/main:deploy/deploy.json"], os.homedir(), dumpDir, "git-dedicated-cache-deploy-json", { timeoutMs: Math.min(timeoutMs, 5000) }); + if (!command.ok) return { commitId: null, command: commandView(command) }; + const parsed = parseJson(command.stdoutText); + return { commitId: stringValue(asRecord(parsed)?.commitId), command: commandView(command) }; +} + +async function cacheCommitExists(commitId, dumpDir, timeoutMs, idPrefix = "git-dedicated-cache-commit") { + if (!commitId) return { exists: false, command: null }; + const command = await runCaptured(["git", "--git-dir", dedicatedCacheRepo, "rev-parse", "--verify", `${commitId}^{commit}`], os.homedir(), dumpDir, idPrefix, { timeoutMs: Math.min(timeoutMs, 5000) }); + return { exists: command.ok, resolvedCommit: command.stdoutText.trim() || null, command: commandView(command) }; +} + +async function refreshDedicatedCache(remoteUrl, dumpDir, timeoutMs) { + const env = egressEnv(); + const parent = path.dirname(dedicatedCacheRepo); + await fsp.mkdir(parent, { recursive: true, mode: 0o700 }); + await fsp.chmod(dedicatedCacheRoot, 0o700).catch(() => undefined); + await fsp.chmod(path.dirname(dedicatedCacheRoot), 0o700).catch(() => undefined); + const attempts = []; + const existing = cacheRepoRecord(dedicatedCacheRepo); + if (existing.exists && (!existing.writable || existing.owner.uid !== process.getuid?.())) { + attempts.push({ stage: "dedicated-cache-owner", ok: false, cache: existing }); + return { ok: false, cache: existing, attempts, stage: "dedicated-cache-owner" }; + } + if (existing.exists) { + const permissions = await runCaptured(["chmod", "-R", "go-rwx", dedicatedCacheRoot], os.homedir(), dumpDir, "git-dedicated-cache-permissions", { timeoutMs: Math.min(timeoutMs, 5000) }); + attempts.push({ stage: "cache-permissions", ok: permissions.ok, command: commandView(permissions) }); + if (!permissions.ok) return { ok: false, cache: cacheRepoRecord(dedicatedCacheRepo), attempts, stage: "cache-permissions" }; + } + + if (!existing.exists) { + for (const seed of seedRepoCandidates().map(cacheRepoRecord)) { + if (!seed.exists || !seed.readable) { + attempts.push({ stage: "seed-unavailable", ok: false, seed }); + continue; + } + await fsp.rm(dedicatedCacheRepo, { recursive: true, force: true }); + const cloneSeed = await runCaptured(["git", "clone", "--bare", seed.path, dedicatedCacheRepo], parent, dumpDir, "git-dedicated-cache-seed-clone", { timeoutMs: Math.min(timeoutMs, 15000) }); + attempts.push({ stage: "seed-clone", ok: cloneSeed.ok, seed, command: commandView(cloneSeed) }); + if (cloneSeed.ok) break; + } + if (!fs.existsSync(dedicatedCacheRepo)) { + const init = await runCaptured(["git", "init", "--bare", dedicatedCacheRepo], parent, dumpDir, "git-dedicated-cache-init", { timeoutMs: Math.min(timeoutMs, 5000) }); + attempts.push({ stage: "bare-init", ok: init.ok, command: commandView(init) }); + if (!init.ok) return { ok: false, cache: cacheRepoRecord(dedicatedCacheRepo), attempts, stage: "bare-init" }; + } + const permissions = await runCaptured(["chmod", "-R", "go-rwx", dedicatedCacheRoot], os.homedir(), dumpDir, "git-dedicated-cache-permissions", { timeoutMs: Math.min(timeoutMs, 5000) }); + attempts.push({ stage: "cache-permissions", ok: permissions.ok, command: commandView(permissions) }); + if (!permissions.ok) return { ok: false, cache: cacheRepoRecord(dedicatedCacheRepo), attempts, stage: "cache-permissions" }; + } + + const remoteSet = await runCaptured(["git", "--git-dir", dedicatedCacheRepo, "remote", "set-url", "origin", remoteUrl], os.homedir(), dumpDir, "git-dedicated-cache-remote-set-url", { timeoutMs: Math.min(timeoutMs, 5000) }); + const remoteAdd = remoteSet.ok ? null : await runCaptured(["git", "--git-dir", dedicatedCacheRepo, "remote", "add", "origin", remoteUrl], os.homedir(), dumpDir, "git-dedicated-cache-remote-add", { timeoutMs: Math.min(timeoutMs, 5000) }); + attempts.push({ stage: "remote-set-url", ok: remoteSet.ok || remoteAdd?.ok === true, command: commandView(remoteSet), fallback: remoteAdd ? commandView(remoteAdd) : null }); + if (!remoteSet.ok && remoteAdd?.ok !== true) return { ok: false, cache: cacheRepoRecord(dedicatedCacheRepo), attempts, stage: "remote-set-url" }; + + const remoteHead = await runCaptured(["git", "--git-dir", dedicatedCacheRepo, "ls-remote", "--heads", "origin", "main"], os.homedir(), dumpDir, "git-dedicated-cache-ls-remote-main", { env, timeoutMs: Math.min(timeoutMs, 15000) }); + const remoteMainCommit = remoteHead.stdoutText.trim().split(/\s+/u)[0] || null; + attempts.push({ stage: "ls-remote-main", ok: remoteHead.ok && Boolean(remoteMainCommit), remoteMainCommit, command: commandView(remoteHead) }); + const localHead = await runCaptured(["git", "--git-dir", dedicatedCacheRepo, "rev-parse", "refs/heads/main"], os.homedir(), dumpDir, "git-dedicated-cache-local-main", { timeoutMs: Math.min(timeoutMs, 5000) }); + const localMainCommit = localHead.stdoutText.trim() || null; + attempts.push({ stage: "local-main", ok: localHead.ok && Boolean(localMainCommit), localMainCommit, command: commandView(localHead) }); + const deployJson = localHead.ok ? await deployJsonCommitFromCache(dumpDir, timeoutMs) : { commitId: null, command: null }; + attempts.push({ stage: "deploy-json-commit", ok: Boolean(deployJson.commitId), commitId: deployJson.commitId, command: deployJson.command }); + const deployCommit = await cacheCommitExists(deployJson.commitId, dumpDir, timeoutMs, "git-dedicated-cache-deploy-commit"); + attempts.push({ stage: "deploy-commit-present", ok: deployCommit.exists, commitId: deployJson.commitId, resolvedCommit: deployCommit.resolvedCommit, command: deployCommit.command }); + const cacheRecordBeforeFetch = cacheRepoRecord(dedicatedCacheRepo); + if (remoteHead.ok && remoteMainCommit && localHead.ok && localMainCommit === remoteMainCommit && deployCommit.exists && cacheRecordBeforeFetch.shallow !== true) { + return { ok: true, cache: cacheRecordBeforeFetch, attempts, stage: "ready", refreshStatus: "current-full", remoteMainCommit, localMainCommit, deployCommitId: deployJson.commitId }; + } + + const fetchArgs = ["git", "--git-dir", dedicatedCacheRepo, "fetch", "--prune", "--tags", "origin", "+refs/heads/*:refs/heads/*"]; + if (cacheRecordBeforeFetch.shallow === true) fetchArgs.splice(4, 0, "--unshallow"); + const fetch = await runCaptured(fetchArgs, os.homedir(), dumpDir, "git-dedicated-cache-fetch-full", { env, timeoutMs: Math.min(timeoutMs, 55000) }); + attempts.push({ stage: "fetch-main", ok: fetch.ok, command: commandView(fetch) }); + if (!fetch.ok) { + if (localHead.ok && localMainCommit && deployCommit.exists) { + return { ok: true, cache: cacheRepoRecord(dedicatedCacheRepo), attempts, stage: "stale-cache", refreshStatus: "blocked", remoteMainCommit, localMainCommit, deployCommitId: deployJson.commitId }; + } + return { ok: false, cache: cacheRepoRecord(dedicatedCacheRepo), attempts, stage: "fetch-full", refreshStatus: "blocked", remoteMainCommit, localMainCommit, deployCommitId: deployJson.commitId }; + } + const fetchedLocalHead = await runCaptured(["git", "--git-dir", dedicatedCacheRepo, "rev-parse", "refs/heads/main"], os.homedir(), dumpDir, "git-dedicated-cache-local-main-after-fetch", { timeoutMs: Math.min(timeoutMs, 5000) }); + const fetchedLocalMainCommit = fetchedLocalHead.stdoutText.trim() || localMainCommit; + attempts.push({ stage: "local-main-after-fetch", ok: fetchedLocalHead.ok && Boolean(fetchedLocalMainCommit), localMainCommit: fetchedLocalMainCommit, command: commandView(fetchedLocalHead) }); + const fetchedDeployJson = await deployJsonCommitFromCache(dumpDir, timeoutMs); + attempts.push({ stage: "deploy-json-commit-after-fetch", ok: Boolean(fetchedDeployJson.commitId), commitId: fetchedDeployJson.commitId, command: fetchedDeployJson.command }); + const fetchedDeployCommit = await cacheCommitExists(fetchedDeployJson.commitId, dumpDir, timeoutMs, "git-dedicated-cache-deploy-commit-after-fetch"); + attempts.push({ stage: "deploy-commit-present-after-fetch", ok: fetchedDeployCommit.exists, commitId: fetchedDeployJson.commitId, resolvedCommit: fetchedDeployCommit.resolvedCommit, command: fetchedDeployCommit.command }); + const permissions = await runCaptured(["chmod", "-R", "go-rwx", dedicatedCacheRoot], os.homedir(), dumpDir, "git-dedicated-cache-permissions-after-fetch", { timeoutMs: Math.min(timeoutMs, 5000) }); + attempts.push({ stage: "cache-permissions-after-fetch", ok: permissions.ok, command: commandView(permissions) }); + if (!permissions.ok) return { ok: false, cache: cacheRepoRecord(dedicatedCacheRepo), attempts, stage: "cache-permissions-after-fetch", refreshStatus: "blocked", remoteMainCommit, localMainCommit: fetchedLocalMainCommit, deployCommitId: fetchedDeployJson.commitId }; + if (!fetchedDeployCommit.exists) return { ok: false, cache: cacheRepoRecord(dedicatedCacheRepo), attempts, stage: "deploy-commit-unresolved", refreshStatus: "blocked", remoteMainCommit, localMainCommit: fetchedLocalMainCommit, deployCommitId: fetchedDeployJson.commitId }; + return { ok: true, cache: cacheRepoRecord(dedicatedCacheRepo), attempts, stage: "ready", refreshStatus: "refreshed-full", remoteMainCommit, localMainCommit: fetchedLocalMainCommit, deployCommitId: fetchedDeployJson.commitId }; +} + +async function materializeFromCache(cachePath, repoPath, remoteUrl, dumpDir, timeoutMs) { + await fsp.rm(repoPath, { recursive: true, force: true }); + const clone = await runCaptured(["git", "clone", "--no-hardlinks", "--no-checkout", cachePath, repoPath], path.dirname(repoPath), dumpDir, "git-cache-clone", { timeoutMs: Math.min(timeoutMs, 15000) }); + if (!clone.ok) return { ok: false, stage: "cache-clone", commands: [commandView(clone)] }; + const remote = await runCaptured(["git", "remote", "set-url", "origin", remoteUrl], repoPath, dumpDir, "git-cache-remote-set-url", { timeoutMs: Math.min(timeoutMs, 5000) }); + if (!remote.ok) return { ok: false, stage: "remote-set-url", commands: [commandView(clone), commandView(remote)] }; + const checkout = await runCaptured(["git", "checkout", "-B", "main", "origin/main"], repoPath, dumpDir, "git-cache-checkout-main", { timeoutMs: Math.min(timeoutMs, 5000) }); + const reset = checkout.ok + ? await runCaptured(["git", "reset", "--hard", "origin/main"], repoPath, dumpDir, "git-cache-reset-hard", { timeoutMs: Math.min(timeoutMs, 5000) }) + : null; + const clean = checkout.ok && reset?.ok + ? await runCaptured(["git", "clean", "-ffdqx"], repoPath, dumpDir, "git-cache-clean", { timeoutMs: Math.min(timeoutMs, 5000) }) + : null; + const upstream = checkout.ok && reset?.ok + ? await runCaptured(["git", "branch", "--set-upstream-to=origin/main", "main"], repoPath, dumpDir, "git-cache-upstream", { timeoutMs: Math.min(timeoutMs, 5000) }) + : null; + const fetchHead = checkout.ok && reset?.ok && clean?.ok + ? await runCaptured(["bash", "-lc", "p=\"$(git rev-parse --path-format=absolute --git-common-dir)/FETCH_HEAD\"; : > \"$p\"; chmod 600 \"$p\""], repoPath, dumpDir, "git-cache-fetch-head", { timeoutMs: Math.min(timeoutMs, 5000) }) + : null; + const commands = [clone, remote, checkout, reset, clean, upstream, fetchHead].filter(Boolean).map(commandView); + return { ok: checkout.ok && reset?.ok === true && clean?.ok === true && fetchHead?.ok === true, stage: "ready", commands }; +} + +async function cloneEphemeralRepo(dumpDir, timeoutMs) { + const root = path.join(dumpDir, "ephemeral-repo"); + const attempts = []; + await fsp.mkdir(root, { recursive: true, mode: 0o700 }); + const egress = await egressProxyHealth(dumpDir, Math.min(timeoutMs, 5000)); + if (egress.status !== "pass") { + return repoRecord(path.join(root, "HWLAB"), "ephemeral:git-cache", { + ephemeral: true, + remoteUrl: null, + egressProxy: egress, + dedicatedCache: cacheRepoRecord(dedicatedCacheRepo), + seedRepos: seedRepoCandidates().map(cacheRepoRecord), + cloneAttempts: [], + cloneFailed: true, + cloneFailureStage: "egress-proxy", + retainedForDiagnostics: true, + }); + } + for (const remoteUrl of remoteCandidates()) { + const repoPath = path.join(root, "HWLAB"); + const refreshed = await refreshDedicatedCache(remoteUrl, dumpDir, Math.min(timeoutMs, 45000)); + attempts.push({ remoteUrl, dedicatedCache: refreshed.cache, ok: refreshed.ok, stage: refreshed.stage, refreshAttempts: refreshed.attempts }); + if (!refreshed.ok) continue; + const materialized = await materializeFromCache(dedicatedCacheRepo, repoPath, remoteUrl, dumpDir, Math.min(timeoutMs, 9000)); + attempts.push({ remoteUrl, dedicatedCache: refreshed.cache, ok: materialized.ok, stage: materialized.stage, commands: materialized.commands }); + if (materialized.ok) { + return repoRecord(repoPath, "ephemeral:dedicated-git-cache", { + ephemeral: true, + remoteUrl, + egressProxy: egress, + dedicatedCache: refreshed.cache, + cacheRefreshStatus: refreshed.refreshStatus || "current", + remoteMainCommit: refreshed.remoteMainCommit || null, + cacheMainCommit: refreshed.localMainCommit || null, + deployCommitId: refreshed.deployCommitId || null, + seedRepos: seedRepoCandidates().map(cacheRepoRecord), + cloneAttempts: attempts, + retainedForDiagnostics: true, + }); + } + } + return repoRecord(path.join(root, "HWLAB"), "ephemeral:dedicated-git-cache", { + ephemeral: true, + remoteUrl: null, + egressProxy: egress, + dedicatedCache: cacheRepoRecord(dedicatedCacheRepo), + seedRepos: seedRepoCandidates().map(cacheRepoRecord), + cloneAttempts: attempts, + cloneFailed: true, + cloneFailureStage: attempts.findLast((attempt) => attempt.stage)?.stage || "cache-unavailable", + retainedForDiagnostics: true, + }); +} + +async function resolveRepo(provided, dumpDir, timeoutMs) { + if (provided) return repoRecord(provided, "option", { ephemeral: false }); + if (process.env.UNIDESK_HWLAB_REPO) return repoRecord(process.env.UNIDESK_HWLAB_REPO, "env:UNIDESK_HWLAB_REPO", { ephemeral: false }); + return cloneEphemeralRepo(dumpDir, Math.min(timeoutMs, 55000)); +} + async function gitSummary(repoPath, dumpDir, timeoutMs) { const [branch, head, originMain, remote, gitDir, gitCommonDir, statusShort, statusPorcelain] = await Promise.all([ runCaptured(["git", "rev-parse", "--abbrev-ref", "HEAD"], repoPath, dumpDir, "git-branch", { timeoutMs }), @@ -1113,6 +1373,14 @@ function collectBlockers({ repo, git, guard, lock, secretRefs, controlled, actio if (repo.status !== "selected") { blockers.push({ scope: "hwlab-repo", summary: repo.rejected ? "Rejected runner history HWLAB directory." : "No eligible HWLAB CD repo with scripts/dev-cd-apply.mjs and deploy/deploy.json was found." }); } + if (repo.cacheRefreshStatus === "blocked") { + blockers.push({ + scope: "hwlab-cache-refresh", + summary: "Dedicated HWLAB git cache could not refresh from GitHub through the provider egress proxy; using the last local cache state is not a release-truth PASS.", + remoteMainCommit: repo.remoteMainCommit || null, + cacheMainCommit: repo.cacheMainCommit || null, + }); + } if (git?.blockers) blockers.push(...git.blockers); if (guard) { if (guard.refusal) blockers.push({ scope: "d601-native-k3s-guard", summary: guard.summary, refusal: true }); @@ -1139,6 +1407,341 @@ function nextSafeCommand(action, blockers) { return "host commander or the unique CD runner may decide whether to run node scripts/dev-cd-apply.mjs --apply --confirm-dev --confirmed-non-production --write-report on D601; this wrapper did not apply or rollout"; } +function compactBlockers(blockers, limit = 12) { + return (Array.isArray(blockers) ? blockers : []).slice(0, limit).map((item) => ({ + scope: item?.scope ?? item?.blockerScope ?? null, + type: item?.type ?? null, + status: item?.status ?? null, + summary: item?.summary ?? item?.reason ?? null, + })); +} + +function compactGit(git) { + if (!git) return null; + return { + status: git.status, + clean: git.clean, + branch: git.branch, + onMain: git.onMain, + remoteMatches: git.remoteMatches, + headMatchesOriginMain: git.headMatchesOriginMain, + dirtyCount: git.dirtyCount, + blockers: compactBlockers(git.blockers, 8), + }; +} + +function compactNodeGuard(guard) { + if (!guard) return null; + return { + status: guard.status, + refusal: guard.refusal, + kubeconfig: guard.kubeconfig, + currentContext: guard.currentContext, + apiServer: guard.apiServer, + nodeNames: guard.nodeNames, + requiredNodePresent: guard.requiredNodePresent, + refusalSignals: guard.refusalSignals, + secondControlPlaneRisk: guard.secondControlPlaneRisk, + defaultKubectlStatus: guard.defaultKubectlDiagnostic?.status ?? null, + summary: guard.summary, + }; +} + +function compactSecretPreflight(secretRefs) { + if (!secretRefs) return null; + return { + status: secretRefs.status, + valuesRead: secretRefs.safety?.secretValuesRead === true, + valuesPrinted: secretRefs.safety?.secretValuesPrinted === true, + secretRefs: (secretRefs.secretRefs || []).map((ref) => ({ + secretName: ref.secretName, + secretKey: ref.secretKey, + exists: ref.exists === true, + keyPresent: ref.keyPresent === true, + status: ref.status, + })).slice(0, 8), + blockers: compactBlockers(secretRefs.blockers, 8), + }; +} + +function compactDesiredState(desired) { + if (!desired) return null; + return { + status: desired.status, + targetCommit: desired.targetCommit, + deployJson: desired.deployJson ? { + hash: desired.deployJson.hash, + commitId: desired.deployJson.commitId, + environment: desired.deployJson.environment, + namespace: desired.deployJson.namespace, + serviceCount: desired.deployJson.serviceCount, + } : null, + artifactCatalog: desired.artifactCatalog ? { + hash: desired.artifactCatalog.hash, + commitId: desired.artifactCatalog.commitId, + artifactState: desired.artifactCatalog.artifactState, + ciPublished: desired.artifactCatalog.ciPublished, + registryVerified: desired.artifactCatalog.registryVerified, + digestCounts: desired.artifactCatalog.digestCounts, + } : null, + imageConvergence: desired.imageConvergence ? { + status: desired.imageConvergence.status, + missing: (desired.imageConvergence.missing || []).slice(0, 5), + mismatches: (desired.imageConvergence.mismatches || []).slice(0, 5), + } : null, + }; +} + +function compactControlled(controlled) { + if (!controlled) return null; + return { + status: controlled.status, + commandOk: controlled.commandOk, + mode: controlled.mode, + mutationAttempted: controlled.mutationAttempted, + prodTouched: controlled.prodTouched, + target: controlled.target ? { + ref: controlled.target.ref, + promotionCommit: controlled.target.promotionCommit, + shortCommitId: controlled.target.shortCommitId, + promotionSource: controlled.target.promotionSource, + publishRequired: controlled.target.publishRequired, + headMatchesTarget: controlled.target.headMatchesTarget, + desiredStateStatus: controlled.target.desiredStateCheck?.status ?? null, + artifactBoundaryStatus: controlled.target.artifactBoundary?.status ?? null, + namespace: controlled.target.namespace, + } : null, + blockers: compactBlockers(controlled.blockers, 8), + }; +} + +function compactAudit(audit) { + if (!audit) return null; + return { + ok: audit.ok, + status: audit.status, + env: audit.env, + namespace: audit.namespace, + nodeGuard: audit.nodeGuard, + secrets: audit.secrets ? { + status: audit.secrets.status, + valuesRead: audit.secrets.valuesRead, + valuesPrinted: audit.secrets.valuesPrinted, + } : null, + registry: audit.registry ? { + status: audit.registry.status, + processHttpAccess: audit.registry.processHttpAccess, + k3sPullAccess: audit.registry.k3sPullAccess, + } : null, + lease: audit.lease, + desiredState: audit.desiredState ? { + status: audit.desiredState.status, + targetCommit: audit.desiredState.targetCommit, + imageConvergence: audit.desiredState.imageConvergence, + } : null, + workload: audit.workload ? { + status: audit.workload.status, + deploymentCount: audit.workload.deploymentCount, + currentImageConvergence: audit.workload.currentImageConvergence, + podWaiting: (audit.workload.podWaiting || []).slice(0, 5), + runtimeJobs: (audit.workload.runtimeJobs || []).slice(0, 5), + } : null, + publicHealth: audit.publicHealth ? { + status: audit.publicHealth.status, + expectedCommit: audit.publicHealth.expectedCommit, + summary: audit.publicHealth.summary, + } : null, + durability: audit.durability ? { + status: audit.durability.status, + unavailableReason: audit.durability.unavailableReason, + } : null, + controlledDevCd: compactControlled(audit.controlledDevCd), + blockerTypes: audit.blockerTypes || [], + blockers: compactBlockers(audit.blockers, 12), + safety: audit.safety, + }; +} + +function compactForTransport(payload, fullResultPath) { + return { + ok: payload.ok, + env: payload.env, + environment: payload.environment, + status: payload.status, + error: payload.error ?? null, + action: payload.action ?? null, + dryRun: payload.dryRun, + mutation: payload.mutation, + remoteHost: payload.remoteHost, + workspace: payload.workspace, + kubeconfig: payload.kubeconfig, + dumpPath: payload.dumpPath, + remoteFullResultPath: fullResultPath, + repo: payload.repo ? { + status: payload.repo.status, + path: payload.repo.path, + source: payload.repo.source, + rejected: payload.repo.rejected, + ephemeral: payload.repo.ephemeral === true, + cacheRefreshStatus: payload.repo.cacheRefreshStatus || null, + } : null, + worktreeGuard: compactGit(payload.worktreeGuard), + nodeGuard: compactNodeGuard(payload.nodeGuard), + secretPreflight: compactSecretPreflight(payload.secretPreflight), + lockState: payload.lockState ? { + status: payload.lockState.status, + phase: payload.lockState.phase, + holder: payload.lockState.ownerTaskId || payload.lockState.holderIdentity || null, + held: payload.lockState.held, + stale: payload.lockState.stale, + retryAfterSeconds: payload.lockState.retryAfterSeconds, + lockName: payload.lockState.lockName, + } : null, + desiredState: payload.desiredState ? { + deployJson: payload.desiredState.deployJson, + artifactCatalog: payload.desiredState.artifactCatalog, + artifactReport: payload.desiredState.artifactReport, + } : null, + target: payload.target, + promotion: payload.promotion ? { + source: payload.promotion.source, + deployJson: payload.promotion.deployJson, + artifactCatalog: payload.promotion.artifactCatalog, + artifactReport: payload.promotion.artifactReport, + } : null, + controlledDevCd: compactControlled(payload.controlledDevCd), + audit: compactAudit(payload.audit), + blockers: compactBlockers(payload.blockers, 12), + blockerTypes: payload.blockerTypes || [...new Set(compactBlockers(payload.blockers, 12).map((item) => item.scope).filter(Boolean))], + nextSafeCommand: payload.nextSafeCommand, + safety: payload.safety, + }; +} + +function emitResult(payload, dumpDir, compactStdout = false) { + const fullResultPath = path.join(dumpDir, "result.full.json"); + const compactResultPath = path.join(dumpDir, "result.compact.json"); + fs.writeFileSync(fullResultPath, `${JSON.stringify(payload, null, 2)}\n`, { mode: 0o600 }); + if (!compactStdout) { + fs.writeFileSync(compactResultPath, `${JSON.stringify(compactForTransport(payload, fullResultPath), null, 2)}\n`, { mode: 0o600 }); + console.log(JSON.stringify(payload, null, 2)); + return; + } + let compact = compactForTransport(payload, fullResultPath); + let text = JSON.stringify(compact, null, 2); + if (Buffer.byteLength(text, "utf8") > 3600) { + compact = { + ok: compact.ok, + env: compact.env, + status: compact.status, + error: compact.error, + action: compact.action, + dryRun: compact.dryRun, + mutation: compact.mutation, + workspace: compact.workspace, + dumpPath: compact.dumpPath, + remoteFullResultPath: compact.remoteFullResultPath, + repo: compact.repo, + worktreeGuard: compact.worktreeGuard, + nodeGuard: compact.nodeGuard, + lockState: compact.lockState, + target: compact.target ? { + ref: compact.target.ref, + promotionCommit: compact.target.promotionCommit, + shortCommitId: compact.target.shortCommitId, + promotionSource: compact.target.promotionSource, + publishRequired: compact.target.publishRequired, + } : null, + controlledDevCd: compact.controlledDevCd, + audit: compact.audit ? { + status: compact.audit.status, + blockerTypes: compact.audit.blockerTypes, + blockers: compact.audit.blockers, + publicHealth: compact.audit.publicHealth, + workload: compact.audit.workload, + durability: compact.audit.durability, + } : null, + blockers: compact.blockers, + blockerTypes: compact.blockerTypes, + nextSafeCommand: compact.nextSafeCommand, + safety: compact.safety, + outputCompacted: "provider-host-ssh-stdout-limit", + }; + text = JSON.stringify(compact, null, 2); + } + if (Buffer.byteLength(text, "utf8") > 2400) { + compact = { + ok: payload.ok, + env: payload.env, + status: payload.status, + error: payload.error ?? null, + action: payload.action ?? null, + dryRun: payload.dryRun, + mutation: payload.mutation, + workspace: payload.workspace, + dumpPath: payload.dumpPath, + remoteFullResultPath: fullResultPath, + repo: payload.repo ? { + status: payload.repo.status, + path: payload.repo.path, + source: payload.repo.source, + rejected: payload.repo.rejected, + ephemeral: payload.repo.ephemeral === true, + cacheRefreshStatus: payload.repo.cacheRefreshStatus || null, + } : null, + worktreeStatus: payload.worktreeGuard?.status ?? null, + nodeGuardStatus: payload.nodeGuard?.status ?? payload.audit?.nodeGuard?.status ?? null, + lockState: payload.lockState ? { + status: payload.lockState.status, + phase: payload.lockState.phase, + held: payload.lockState.held, + holder: payload.lockState.ownerTaskId || payload.lockState.holderIdentity || null, + stale: payload.lockState.stale, + } : null, + target: payload.target ? { + ref: payload.target.ref, + promotionCommit: payload.target.promotionCommit, + shortCommitId: payload.target.shortCommitId, + promotionSource: payload.target.promotionSource, + } : payload.audit?.desiredState ? { + targetCommit: payload.audit.desiredState.targetCommit, + } : null, + controlledDevCdStatus: payload.controlledDevCd?.status ?? payload.audit?.controlledDevCd?.status ?? null, + auditStatus: payload.audit?.status ?? null, + blockerTypes: payload.blockerTypes || payload.audit?.blockerTypes || [...new Set(compactBlockers(payload.blockers, 8).map((item) => item.scope).filter(Boolean))], + blockers: compactBlockers(payload.blockers || payload.audit?.blockers, 6), + nextSafeCommand: payload.nextSafeCommand, + outputCompacted: "provider-host-ssh-stdout-limit", + }; + text = JSON.stringify(compact); + } + if (Buffer.byteLength(text, "utf8") > 520) { + const blockerTypes = payload.blockerTypes + || payload.audit?.blockerTypes + || [...new Set(compactBlockers(payload.blockers || payload.audit?.blockers, 8).map((item) => item.scope).filter(Boolean))]; + compact = { + ok: payload.ok, + env: payload.env, + status: payload.status, + error: payload.error ?? null, + action: payload.action ?? null, + remoteFullResultPath: fullResultPath, + repoPath: payload.repo?.path ?? payload.workspace?.path ?? null, + repoEphemeral: payload.repo?.ephemeral === true, + cacheRefreshStatus: payload.repo?.cacheRefreshStatus ?? null, + worktreeStatus: payload.worktreeGuard?.status ?? null, + nodeGuardStatus: payload.nodeGuard?.status ?? payload.audit?.nodeGuard?.status ?? null, + lockStatus: payload.lockState ? `${payload.lockState.status || "unknown"}/${payload.lockState.phase || "unknown"}/${payload.lockState.held === true ? "held" : "free"}` : null, + targetCommit: payload.target?.promotionCommit ?? payload.audit?.desiredState?.targetCommit ?? null, + blockerTypes: Array.isArray(blockerTypes) ? blockerTypes.slice(0, 4) : [], + outputCompacted: true, + }; + text = JSON.stringify(compact); + } + fs.writeFileSync(compactResultPath, `${text}\n`, { mode: 0o600 }); + console.log(text); +} + async function main() { const options = decodeOptions(); const timeoutMs = Math.min(Number(options.timeoutMs || 45000), 60000); @@ -1146,8 +1749,8 @@ async function main() { const dumpDir = path.join(os.homedir(), ".state", "unidesk-hwlab-cd", runId); await fsp.mkdir(dumpDir, { recursive: true, mode: 0o700 }); - const repo = resolveRepo(options.repoPath || null); const action = options.action; + const repo = await resolveRepo(options.repoPath || null, dumpDir, Math.min(timeoutMs, 55000)); const base = { ok: false, env: "dev", @@ -1176,7 +1779,7 @@ async function main() { if (repo.status !== "selected") { const blockers = collectBlockers({ repo, action }); - console.log(JSON.stringify({ ...base, status: "blocked", error: "hwlab-repo-not-found", repo, blockers, nextSafeCommand: nextSafeCommand(action, blockers) }, null, 2)); + emitResult({ ...base, status: "blocked", error: "hwlab-repo-not-found", repo, blockers, nextSafeCommand: nextSafeCommand(action, blockers) }, dumpDir, options.compactStdout === true); process.exitCode = 1; return; } @@ -1218,7 +1821,7 @@ async function main() { ]); const durability = runtimeDurabilityAudit(auditDesiredState, publicHealth, secretRefs, workload); const summary = auditSummary({ repo, git, guard, secretRefs, registry, lock, desired: auditDesiredState, workload, publicHealth, durability, controlled, dumpDir }); - console.log(JSON.stringify({ + emitResult({ ...base, ok: summary.ok, status: summary.status, @@ -1235,13 +1838,13 @@ async function main() { blockers: summary.blockers, blockerTypes: summary.blockerTypes, nextSafeCommand: nextSafeCommand(action, summary.blockers), - }, null, 2)); + }, dumpDir, options.compactStdout === true); process.exitCode = summary.ok ? 0 : 1; return; } const blockers = collectBlockers({ repo, git, guard, lock, secretRefs, controlled, action }); const ok = blockers.length === 0; - console.log(JSON.stringify({ + emitResult({ ...base, ok, status: ok ? action === "status" ? "ready" : "prepared" : guard.refusal ? "refused" : "blocked", @@ -1263,7 +1866,7 @@ async function main() { reportDumpPath: controlled.command?.dump?.stdout || dumpDir, blockers, nextSafeCommand: nextSafeCommand(action, blockers), - }, null, 2)); + }, dumpDir, options.compactStdout === true); process.exitCode = ok ? 0 : 1; } diff --git a/scripts/src/hwlab-cd.ts b/scripts/src/hwlab-cd.ts index 1decb369..6f6aa5e9 100644 --- a/scripts/src/hwlab-cd.ts +++ b/scripts/src/hwlab-cd.ts @@ -3,6 +3,7 @@ import { randomBytes } from "node:crypto"; import { createWriteStream, mkdirSync, readFileSync } from "node:fs"; import { mkdir } from "node:fs/promises"; import { join } from "node:path"; +import { gzipSync } from "node:zlib"; import { readConfig, repoRoot, rootPath, type UniDeskConfig } from "./config"; import { d601NativeKubeconfig } from "./d601-k3s-guard"; @@ -46,6 +47,24 @@ interface FrontendSession { cookie: string; } +interface FrontendHostSshResult { + ok: boolean; + taskId: string | null; + taskStatus: unknown; + stdout: string; + stderr: string; + exitCode: number | null; + error: string; + raw: unknown; +} + +interface RemoteRunnerJob { + pid: number | null; + remoteRunDir: string | null; + remoteCompactResultPath: string | null; + remoteFullResultPath: string | null; +} + interface FetchJsonResult { ok: boolean; status?: number; @@ -54,7 +73,8 @@ interface FetchJsonResult { } const defaultProviderId = "D601"; -const defaultHwlabCdRepoPath = "/home/ubuntu/hwlab_cd"; +const defaultHwlabCdRepoPath = "~/.state/unidesk-hwlab-cd//ephemeral-repo/HWLAB"; +const defaultHwlabCdCachePath = "~/.cache/unidesk/hwlab-cd/git-cache/HWLAB.git"; const rejectedRunnerHistoryRepoPath = "/home/ubuntu/hwlab"; const namespace = "hwlab-dev"; const lockName = "hwlab-dev-cd-lock"; @@ -160,8 +180,8 @@ function b64Json(value: unknown): string { return Buffer.from(JSON.stringify(value), "utf8").toString("base64"); } -function buildRemoteCommand(options: HwlabCdOptions, runId: string): string { - const remoteOptions = { +function buildRemoteOptions(options: HwlabCdOptions, runId: string): Record { + return { action: options.action, environment: options.environment, dryRun: options.dryRun, @@ -169,14 +189,113 @@ function buildRemoteCommand(options: HwlabCdOptions, runId: string): string { kubeconfig: options.kubeconfig, timeoutMs: options.timeoutMs, runId, + compactStdout: options.transport !== "local", }; +} + +function buildRemoteInlineCommand(options: HwlabCdOptions, runId: string): string { return [ - `UNIDESK_HWLAB_CD_OPTIONS_B64=${shellQuote(b64Json(remoteOptions))} node <<'UNIDESK_HWLAB_CD_JS'`, + `UNIDESK_HWLAB_CD_OPTIONS_B64=${shellQuote(b64Json(buildRemoteOptions(options, runId)))} node <<'UNIDESK_HWLAB_CD_JS'`, remoteScriptSource, "UNIDESK_HWLAB_CD_JS", ].join("\n"); } +function buildRemoteUploadedCommand(options: HwlabCdOptions, runId: string, remoteScriptPath: string): string { + return `UNIDESK_HWLAB_CD_OPTIONS_B64=${shellQuote(b64Json(buildRemoteOptions(options, runId)))} node ${shellQuote(remoteScriptPath)}`; +} + +function remoteRunnerDir(runId: string): string { + return `$HOME/.state/unidesk-hwlab-cd/${safeRemoteId(runId)}`; +} + +function buildRemoteStartCommand(options: HwlabCdOptions, runId: string, remoteScriptPath: string): string { + const runDir = remoteRunnerDir(runId); + const encodedOptions = shellQuote(b64Json(buildRemoteOptions(options, runId))); + return [ + "umask 077", + `run_dir="${runDir}"`, + `mkdir -p "$run_dir"`, + `rm -f "$run_dir/result.full.json" "$run_dir/result.compact.json" "$run_dir/runner.stdout.txt" "$run_dir/runner.stderr.txt" "$run_dir/runner.exit"`, + `nohup sh -c 'UNIDESK_HWLAB_CD_OPTIONS_B64="$1" node "$2" > "$3" 2> "$4"; printf "%s\\n" "$?" > "$5"' sh ${encodedOptions} ${shellQuote(remoteScriptPath)} "$run_dir/runner.stdout.txt" "$run_dir/runner.stderr.txt" "$run_dir/runner.exit" >/dev/null 2>&1 & pid=$!`, + `printf '{"pid":%s,"remoteRunDir":"%s","remoteCompactResultPath":"%s/result.compact.json","remoteFullResultPath":"%s/result.full.json"}\\n' "$pid" "$run_dir" "$run_dir" "$run_dir"`, + ].join("; "); +} + +function buildRemotePollCommand(runId: string): string { + const runDir = remoteRunnerDir(runId); + const pollJs = [ + "const fs=require(\"fs\")", + "const dir=process.argv[1]", + "const compact=`${dir}/result.compact.json`", + "const exitPath=`${dir}/runner.exit`", + "const tail=(p)=>{try{const b=fs.readFileSync(p);return b.slice(Math.max(0,b.length-500)).toString(\"utf8\")}catch{return \"\"}}", + "if(fs.existsSync(compact)&&fs.statSync(compact).size>0){process.stdout.write(fs.readFileSync(compact,\"utf8\"));process.exit(0)}", + "if(fs.existsSync(exitPath)){const raw=fs.readFileSync(exitPath,\"utf8\").trim();console.log(JSON.stringify({ok:false,status:\"blocked\",error:\"remote-runner-exited-without-result\",remoteRunDir:dir,exitCode:Number(raw),stdoutTail:tail(`${dir}/runner.stdout.txt`),stderrTail:tail(`${dir}/runner.stderr.txt`)}));process.exit(0)}", + "console.log(JSON.stringify({ok:false,status:\"running\",remoteRunDir:dir}))", + ].join(";"); + return `run_dir="${runDir}"; node -e ${shellQuote(pollJs)} "$run_dir"`; +} + +function buildRemoteCommand(options: HwlabCdOptions, runId: string): string { + return buildRemoteInlineCommand(options, runId); +} + +function splitFixed(value: string, size: number): string[] { + const chunks: string[] = []; + for (let index = 0; index < value.length; index += size) chunks.push(value.slice(index, index + size)); + return chunks; +} + +function safeRemoteId(value: string): string { + return value.replace(/[^A-Za-z0-9_.-]/gu, "-"); +} + +function compactTail(value: string, maxLength = 1000): string { + return value.slice(Math.max(0, value.length - maxLength)); +} + +function uploadCommandShape(providerId = defaultProviderId): string[] { + return ["frontend", "/api/dispatch", providerId, "host.ssh", "upload remote runner in small chunks", "start nohup remote runner", "poll ~/.state/unidesk-hwlab-cd//result.compact.json"]; +} + +function buildUploadPlan(runId: string): { remoteDir: string; remotePath: string; commands: Array<{ id: string; command: string }> } { + const remoteDir = "/tmp/unidesk-hwlab-cd"; + const remotePath = `${remoteDir}/remote-runner-${safeRemoteId(runId)}.cjs`; + const encoded = gzipSync(Buffer.from(remoteScriptSource, "utf8")).toString("base64"); + const b64Path = `${remotePath}.gz.b64`; + const commands = [ + { + id: "init", + command: `umask 077; mkdir -p ${shellQuote(remoteDir)}; rm -f ${shellQuote(remotePath)} ${shellQuote(b64Path)}; : > ${shellQuote(b64Path)}`, + }, + ...splitFixed(encoded, 2400).map((chunk, index) => ({ + id: `chunk-${index + 1}`, + command: `printf %s ${shellQuote(chunk)} >> ${shellQuote(b64Path)}`, + })), + { + id: "finalize", + command: `base64 -d ${shellQuote(b64Path)} | gzip -d > ${shellQuote(remotePath)}; chmod 700 ${shellQuote(remotePath)}; rm -f ${shellQuote(b64Path)}; test -s ${shellQuote(remotePath)}; wc -c ${shellQuote(remotePath)}`, + }, + ]; + return { remoteDir, remotePath, commands }; +} + +function uploadStepView(step: { id: string; command: string }, result: FrontendHostSshResult): Record { + const remoteOptions = { + id: step.id, + ok: result.ok, + taskId: result.taskId, + taskStatus: result.taskStatus, + exitCode: result.exitCode, + commandBytes: step.command.length, + stdoutTail: compactTail(result.stdout, 500), + stderrTail: compactTail(result.stderr, 500), + error: result.error, + }; + return remoteOptions; +} + export function buildHwlabCdRemoteCommandForTest(args: string[]): string { return buildRemoteCommand(parseOptions(args), "test-run-id"); } @@ -325,7 +444,7 @@ function parseRemoteStdout(stdout: string): Record | null { } function commandShape(providerId = defaultProviderId): string[] { - return ["frontend", "/api/dispatch", providerId, "host.ssh", "UNIDESK_HWLAB_CD_OPTIONS_B64=... node <<'UNIDESK_HWLAB_CD_JS'"]; + return uploadCommandShape(providerId); } async function runLocalTransport(options: HwlabCdOptions, remoteCommand: string, runId: string): Promise> { @@ -370,65 +489,219 @@ async function runLocalTransport(options: HwlabCdOptions, remoteCommand: string, }; } -async function runFrontendTransport(options: HwlabCdOptions, remoteCommand: string, config: UniDeskConfig, runId: string): Promise> { - const host = options.mainServerHost ?? config.network.publicHost; - const dumpDir = rootPath(".state", "hwlab-cd", runId); - mkdirSync(dumpDir, { recursive: true, mode: 0o700 }); - const session = await loginFrontend(host, config); +async function dispatchFrontendHostSsh( + session: FrontendSession, + options: HwlabCdOptions, + command: string, + timeoutMs: number, + source = "cli-hwlab-cd", +): Promise { const dispatch = await frontendJson(session, "/api/dispatch", { method: "POST", body: JSON.stringify({ providerId: options.providerId, command: "host.ssh", - payload: { source: "cli-hwlab-cd", mode: "exec", command: remoteCommand, timeoutMs: options.timeoutMs }, + payload: { source, mode: "exec", command, timeoutMs }, }), }, 12_000); const taskId = String(asRecord(dispatch.body)?.taskId ?? ""); if (!dispatch.ok || taskId.length === 0) { + return { + ok: false, + taskId: taskId || null, + taskStatus: null, + stdout: "", + stderr: "", + exitCode: null, + error: asRecord(dispatch.body)?.error as string ?? dispatch.error ?? "provider dispatch failed", + raw: dispatch, + }; + } + const wait = await waitForFrontendTask(session, taskId, Math.min(timeoutMs + 5000, maxTimeoutMs)); + const task = asRecord((wait as { task?: unknown }).task); + const result = asRecord(task?.result) ?? {}; + const exitCode = typeof result.exitCode === "number" ? result.exitCode : null; + const stdout = typeof result.stdout === "string" ? result.stdout : ""; + const stderr = typeof result.stderr === "string" ? result.stderr : ""; + const error = typeof result.error === "string" ? result.error : ""; + return { + ok: task?.status === "succeeded" && (exitCode === null || exitCode === 0), + taskId, + taskStatus: task?.status ?? null, + stdout, + stderr, + exitCode, + error, + raw: task, + }; +} + +async function uploadRemoteRunner(session: FrontendSession, options: HwlabCdOptions, runId: string, dumpDir: string): Promise<{ ok: boolean; path: string; steps: Record[]; error: string }> { + const plan = buildUploadPlan(runId); + const steps: Record[] = []; + for (const step of plan.commands) { + const result = await dispatchFrontendHostSsh(session, options, step.command, Math.min(options.timeoutMs, 20_000), "cli-hwlab-cd-upload"); + steps.push(uploadStepView(step, result)); + if (!result.ok) { + await Bun.write(join(dumpDir, "frontend-upload.json"), `${JSON.stringify({ ok: false, path: plan.remotePath, steps }, null, 2)}\n`); + return { ok: false, path: plan.remotePath, steps, error: result.error || result.stderr || `upload step ${step.id} failed` }; + } + } + await Bun.write(join(dumpDir, "frontend-upload.json"), `${JSON.stringify({ ok: true, path: plan.remotePath, steps }, null, 2)}\n`); + return { ok: true, path: plan.remotePath, steps, error: "" }; +} + +function parseRemoteRunnerJob(stdout: string): RemoteRunnerJob | null { + const parsed = parseRemoteStdout(stdout); + if (parsed === null) return null; + return { + pid: typeof parsed.pid === "number" ? parsed.pid : null, + remoteRunDir: typeof parsed.remoteRunDir === "string" ? parsed.remoteRunDir : null, + remoteCompactResultPath: typeof parsed.remoteCompactResultPath === "string" ? parsed.remoteCompactResultPath : null, + remoteFullResultPath: typeof parsed.remoteFullResultPath === "string" ? parsed.remoteFullResultPath : null, + }; +} + +async function startRemoteRunner( + session: FrontendSession, + options: HwlabCdOptions, + runId: string, + remoteScriptPath: string, + dumpDir: string, +): Promise<{ ok: boolean; job: RemoteRunnerJob | null; result: FrontendHostSshResult; dump: string }> { + const result = await dispatchFrontendHostSsh(session, options, buildRemoteStartCommand(options, runId, remoteScriptPath), 8000, "cli-hwlab-cd-start"); + const stdoutDump = join(dumpDir, "frontend-runner-start.stdout.txt"); + const stderrDump = join(dumpDir, "frontend-runner-start.stderr.txt"); + await Bun.write(stdoutDump, result.stdout); + await Bun.write(stderrDump, result.stderr); + const job = result.ok ? parseRemoteRunnerJob(result.stdout) : null; + const dump = join(dumpDir, "frontend-runner-start.json"); + await Bun.write(dump, `${JSON.stringify({ ok: result.ok && job !== null, job, taskId: result.taskId, taskStatus: result.taskStatus, exitCode: result.exitCode, stdoutDump, stderrDump, error: result.error }, null, 2)}\n`); + return { ok: result.ok && job !== null, job, result, dump }; +} + +async function pollRemoteRunnerResult( + session: FrontendSession, + options: HwlabCdOptions, + runId: string, + dumpDir: string, +): Promise<{ parsed: Record | null; stdout: string; stderr: string; result: FrontendHostSshResult | null; polls: Record[]; timedOut: boolean; dump: string }> { + const startedAt = Date.now(); + const polls: Record[] = []; + const pollCommand = buildRemotePollCommand(runId); + let lastResult: FrontendHostSshResult | null = null; + while (Date.now() - startedAt < options.timeoutMs) { + const result = await dispatchFrontendHostSsh(session, options, pollCommand, 6000, "cli-hwlab-cd-poll"); + lastResult = result; + const parsed = result.ok ? parseRemoteStdout(result.stdout) : null; + polls.push({ + ok: result.ok, + taskId: result.taskId, + taskStatus: result.taskStatus, + exitCode: result.exitCode, + parsedStatus: parsed?.status ?? null, + parsedError: parsed?.error ?? null, + stdoutTail: compactTail(result.stdout, 500), + stderrTail: compactTail(result.stderr, 500), + error: result.error, + }); + if (parsed !== null && parsed.status !== "running") { + const dump = join(dumpDir, "frontend-runner-poll.json"); + await Bun.write(dump, `${JSON.stringify({ ok: true, polls }, null, 2)}\n`); + return { parsed, stdout: result.stdout, stderr: result.stderr, result, polls, timedOut: false, dump }; + } + await Bun.sleep(1000); + } + const dump = join(dumpDir, "frontend-runner-poll.json"); + await Bun.write(dump, `${JSON.stringify({ ok: false, timedOut: true, polls }, null, 2)}\n`); + return { parsed: null, stdout: lastResult?.stdout ?? "", stderr: lastResult?.stderr ?? "", result: lastResult, polls, timedOut: true, dump }; +} + +async function runFrontendTransport(options: HwlabCdOptions, config: UniDeskConfig, runId: string): Promise> { + const host = options.mainServerHost ?? config.network.publicHost; + const dumpDir = rootPath(".state", "hwlab-cd", runId); + mkdirSync(dumpDir, { recursive: true, mode: 0o700 }); + const session = await loginFrontend(host, config); + const upload = await uploadRemoteRunner(session, options, runId, dumpDir); + if (!upload.ok) { return { ok: false, env: options.environment, status: "blocked", - error: "provider-dispatch-failed", + error: "remote-runner-upload-failed", remote: { host: session.baseUrl, providerId: options.providerId, transport: "frontend", commandShape: commandShape(options.providerId), commandCalls: ["scripts/dev-cd-apply.mjs"], - dispatch, + uploadPath: upload.path, + uploadDump: join(dumpDir, "frontend-upload.json"), }, dumpPath: dumpDir, - nextSafeCommand: "restore UniDesk frontend/backend-core provider dispatch, then rerun bun scripts/cli.ts hwlab cd status --env dev", + upload, + nextSafeCommand: "restore D601 host.ssh command dispatch or upload permissions, then rerun bun scripts/cli.ts hwlab cd status --env dev", }; } - const wait = await waitForFrontendTask(session, taskId, Math.min(options.timeoutMs + 5000, maxTimeoutMs)); - const task = asRecord((wait as { task?: unknown }).task); - const result = asRecord(task?.result) ?? {}; - const stdout = typeof result.stdout === "string" ? result.stdout : ""; - const stderr = typeof result.stderr === "string" ? result.stderr : ""; + + const runner = await startRemoteRunner(session, options, runId, upload.path, dumpDir); + if (!runner.ok) { + return { + ok: false, + env: options.environment, + status: "blocked", + error: "remote-runner-start-failed", + remote: { + host: session.baseUrl, + providerId: options.providerId, + transport: "frontend", + taskId: runner.result.taskId, + taskStatus: runner.result.taskStatus, + commandShape: commandShape(options.providerId), + commandCalls: ["scripts/dev-cd-apply.mjs"], + uploadPath: upload.path, + uploadDump: join(dumpDir, "frontend-upload.json"), + runnerStartDump: runner.dump, + exitCode: runner.result.exitCode, + error: runner.result.error, + }, + dumpPath: dumpDir, + nextSafeCommand: "restore D601 host.ssh background job dispatch, then rerun bun scripts/cli.ts hwlab cd status --env dev", + }; + } + + const poll = await pollRemoteRunnerResult(session, options, runId, dumpDir); + const stdout = poll.stdout; + const stderr = poll.stderr; const stdoutDump = join(dumpDir, "frontend-task.stdout.txt"); const stderrDump = join(dumpDir, "frontend-task.stderr.txt"); - Bun.write(stdoutDump, stdout); - Bun.write(stderrDump, stderr); - const parsed = parseRemoteStdout(stdout); + await Bun.write(stdoutDump, stdout); + await Bun.write(stderrDump, stderr); + const parsed = poll.parsed ?? parseRemoteStdout(stdout); if (parsed === null) { return { ok: false, env: options.environment, status: "blocked", - error: "remote-empty-or-invalid-json", + error: poll.timedOut ? "remote-runner-timeout" : "remote-empty-or-invalid-json", remote: { host: session.baseUrl, providerId: options.providerId, transport: "frontend", - taskId, - taskStatus: task?.status ?? null, + taskId: poll.result?.taskId ?? null, + taskStatus: poll.result?.taskStatus ?? null, commandShape: commandShape(options.providerId), commandCalls: ["scripts/dev-cd-apply.mjs"], + uploadPath: upload.path, + uploadDump: join(dumpDir, "frontend-upload.json"), + runnerStartDump: runner.dump, + runnerPollDump: poll.dump, + remoteRunDir: runner.job?.remoteRunDir ?? null, + remoteFullResultPath: runner.job?.remoteFullResultPath ?? null, stdoutDump, stderrDump, - exitCode: typeof result.exitCode === "number" ? result.exitCode : null, + exitCode: poll.result?.exitCode ?? null, + error: poll.result?.error ?? "", }, dumpPath: dumpDir, }; @@ -439,13 +712,19 @@ async function runFrontendTransport(options: HwlabCdOptions, remoteCommand: stri host: session.baseUrl, providerId: options.providerId, transport: "frontend", - taskId, - taskStatus: task?.status ?? null, + taskId: poll.result?.taskId ?? null, + taskStatus: poll.result?.taskStatus ?? null, commandShape: commandShape(options.providerId), commandCalls: ["scripts/dev-cd-apply.mjs"], + uploadPath: upload.path, + uploadDump: join(dumpDir, "frontend-upload.json"), + runnerStartDump: runner.dump, + runnerPollDump: poll.dump, + remoteRunDir: runner.job?.remoteRunDir ?? null, + remoteFullResultPath: runner.job?.remoteFullResultPath ?? null, stdoutDump, stderrDump, - exitCode: typeof result.exitCode === "number" ? result.exitCode : null, + exitCode: poll.result?.exitCode ?? null, }, localDumpPath: dumpDir, }; @@ -502,14 +781,17 @@ export function hwlabHelp(): Record { "bun scripts/cli.ts hwlab cd preflight --env dev", "bun scripts/cli.ts hwlab cd apply --env dev --dry-run", ], - description: "Bounded UniDesk wrapper for the HWLAB DEV CD path. The default transport dispatches a D601 provider host.ssh command and the D601 side calls HWLAB repo-owned scripts/dev-cd-apply.mjs.", + description: "Bounded UniDesk wrapper for the HWLAB DEV CD path. The default transport uploads a small remote runner to D601 through short host.ssh commands, creates a one-run HWLAB clone owned by the host.ssh user, then calls HWLAB repo-owned scripts/dev-cd-apply.mjs from the D601 side.", boundary: [ `KUBECONFIG is forced to ${d601NativeKubeconfig}`, "docker-desktop, desktop-control-plane, and 127.0.0.1:11700 are refusal signals for the explicit D601 target", - `default HWLAB CD repo is ${defaultHwlabCdRepoPath}; ${rejectedRunnerHistoryRepoPath} is rejected as runner history`, + `default HWLAB CD repo is ${defaultHwlabCdRepoPath}, materialized from dedicated full bare cache ${defaultHwlabCdCachePath}; ${rejectedRunnerHistoryRepoPath} is rejected as runner history`, "deploy/deploy.json remains the authoritative desired-state source", + "the default HWLAB repo is an ephemeral one-run clone under the remote dump directory; --hwlab-repo is reserved for explicitly supplied clean diagnostic clones", + "the dedicated cache is owner-only and HWLAB-CD-only; it must keep full repo history so deploy/deploy.json promotion commits resolve, and one-run repos are copied with independent object stores rather than --shared alternates", "preflight/apply --dry-run check required SecretRef object/key metadata without reading or printing Secret values", "audit/status/preflight/apply --dry-run call only HWLAB scripts/dev-cd-apply.mjs read-only/status paths with --skip-live-verify; no apply, rollout, lock mutation, DEV acceptance live verification, DB write, or secret read is executed", + "frontend transport keeps each host.ssh command below the provider-gateway short-command limit by uploading the remote runner in chunks, starting it as a remote background job, and polling compact result files", "audit adds bounded read-only kubectl get/curl health probes for blocker classification; full stdout/stderr stays in temp dump paths, not reports/", "real apply is structured refused and must remain with the host commander or unique CD runner", ], @@ -524,5 +806,5 @@ export async function runHwlabCdCommand(args: string[]): Promise