diff --git a/docs/reference/cli.md b/docs/reference/cli.md index 9c84d30a..840e4ff8 100644 --- a/docs/reference/cli.md +++ b/docs/reference/cli.md @@ -242,7 +242,7 @@ bun scripts/cli.ts ssh D601 glob --root /home/ubuntu/pikapython --pattern '**/*- `:win skills [--scope agents|codex|all] [--limit N]` 是 Windows 用户 skill 发现入口,默认只读取当前 Windows 用户的 `%USERPROFILE%\.agents\skills`,输出 JSON 中包含 `roots`、`counts` 和每个 skill 的 `name`、`path`、`skillFile`、`description`。需要同时检查 `%USERPROFILE%\.codex\skills` 时显式加 `--scope all`;不要为了列 skill 手写 `cmd dir` 或宽泛扫描整个用户目录。 -`D601:k3s` 或 `G14:k3s` 定位到对应 provider 的原生 k3s 控制面;`:k3s::[:container]` 定位到 namespace 下的一个默认 deployment workload;若目标是具体 Pod,workload 段写成 `pod/`,若目标是 Deployment,也可以显式写 `deployment/` 或简写 ``。pod 内 workspace 使用 slash 后缀表达,例如 `D601:k3s:hwlab-dev:hwlab-cloud-api/app` 会定位到 deployment `hwlab-cloud-api` 并在 pod 内先 `cd /app`,`D601:k3s:hwlab-dev:pod/hwlab-cloud-api-abc/workspace/app:api` 会定位到 pod、container 和 `/workspace/app`。`kubectl`、`logs`、`script`、`apply-patch`、旧 `apply-patch-v1` fallback、`exec` 和普通容器命令都是 route 后面的 operation,这样路由子模块和操作子模块可以独立扩展。 +`D601:k3s` 或 `G14:k3s` 定位到对应 provider 的原生 k3s 控制面;`:k3s::[:container]` 定位到 namespace 下的一个默认 deployment workload;若目标是具体 Pod,标准 workload 段写成 `pod:`,旧 `pod/` 只作为兼容输入继续接受;若目标是 Deployment,也可以显式写 `deployment/` 或简写 ``。pod 内 workspace 使用 slash 后缀表达,slash 只用于 pod/container 内文件系统定位,例如 `D601:k3s:hwlab-dev:hwlab-cloud-api/app` 会定位到 deployment `hwlab-cloud-api` 并在 pod 内先 `cd /app`,`D601:k3s:hwlab-dev:pod:hwlab-cloud-api-abc/workspace/app:api` 会定位到 pod、container 和 `/workspace/app`。`kubectl`、`logs`、`script`、`apply-patch`、旧 `apply-patch-v1` fallback、`exec` 和普通容器命令都是 route 后面的 operation,这样路由子模块和操作子模块可以独立扩展。 `k3s` 必须出现在 route 的 plane 段里,禁止使用 `ssh G14 k3s ...` 或 `ssh D601 k3s ...` 这类 post-provider shorthand;正确形态是 `ssh G14:k3s kubectl ...` 或 `ssh D601:k3s kubectl ...`。定位和操作必须保持分离,`kubectl`、`logs`、`script`、`apply-patch`、旧 `apply-patch-v1` fallback、`exec` 等 operation 名也不得放进任何 colon route 段,包括 namespace、workload 或 container 段;新增分布式目标时按 `{provider}:{plane}:{scope}` 扩展 route,而不是在 operation args 中新增另一套定位语法。 diff --git a/docs/reference/gc.md b/docs/reference/gc.md index fac0a9e0..286b92b2 100644 --- a/docs/reference/gc.md +++ b/docs/reference/gc.md @@ -44,25 +44,29 @@ UniDesk 的磁盘治理入口是 `bun scripts/cli.ts gc ...`。该入口用于 ## HWLAB Registry Retention -G14 HWLAB registry 清理必须显式使用 `--include-hwlab-registry`,默认 `gc remote G14 plan` 不进入 registry。策略必须保守,不能只留 latest。 +G14 HWLAB registry 清理必须显式使用 `--include-hwlab-registry`,默认 `gc remote G14 plan` 不进入 registry。策略必须保守,不能只留 latest,也不能只删除 tag link 后误判已经释放空间。 默认保留规则: | 保留项 | 规则 | |---|---| | 当前 workload 引用 | 保留所有当前 k3s workload 使用的 tag ref 和 digest ref | -| 近期 tag | 保留 `--registry-min-age-hours` 内全部 tag,默认 48 小时,最小 24 小时 | -| 每 repo 最新 tag | 每个业务 repo 至少保留 `--registry-keep-per-repo` 个最新 tag,默认 20,最小 10 | +| digest closure | 从当前 workload、保留 tag、protected/base/非业务 repo 出发,保留 manifest config/layer/manifest-list 的 digest 闭包 | +| 近期 tag | 保留 `--registry-min-age-hours` 内全部 tag,默认 48 小时,可显式设为 0 | +| 每 repo 最新 tag | 每个业务 repo 至少保留 `--registry-keep-per-repo` 个最新 tag,默认 20,最小 1 | | cache/base/protected tag | 保留 cache repo、`latest`、基础镜像 tag 和显式 protected tag | | 非业务 repo | 默认不删除非 `hwlab/hwlab-*` 的 commit-like tag | -删除范围只允许 commit-like tag,并且仅当对应 manifest digest 不再被任何保留 tag 引用时,才通过 registry API 删除 manifest。随后必须缩容 registry pod,运行官方 `registry garbage-collect`,再恢复 registry。`--registry-gc-only` 只用于中断恢复或人工维护窗口收尾:它不删除任何 tag,只运行官方 GC。 +删除范围包括两类:不在保留集内的 commit-like tag,以及 `hwlab/hwlab-*` / `hwlab/cache/hwlab-*` repo 内不在保留 digest closure 的 stale `_manifests/revisions/sha256/`。执行时先在 registry 在线状态下通过 API 删除 manifest,再缩容 registry pod,删除遗留 tag/revision 目录,运行官方 `registry garbage-collect`,最后恢复 registry。`--registry-gc-only` 只用于中断恢复或人工维护窗口收尾:它不删除任何 tag 或 revision,只运行官方 GC。 + +BuildKit cache repo 的历史 `latest` 写入会留下大量 untagged manifest revision;只删除 tag 通常只能释放少量 manifest blob,无法释放旧 cache layer。需要大幅降低 `/var/lib/hwlab/registry` 时,必须以 plan 输出的 `deleteRevisions`、`protectedDigestClosure` 和 official GC 后的 `diskAfterBytes` 判断是否真正生效。 Registry 执行必须以远端异步 job 完成,并具备以下维护保护: - 暂停 G14/v0.2 branch poller CronJob。 - 等待 hwlab-ci PipelineRun、TaskRun 和 Job 空闲。 - 通过 registry API 删除 manifest 时 registry 必须仍在线。 +- registry 下线后只能删除通过 plan 判定的 tag/revision 目录;不得直接删除 blob 目录。 - 缩容 registry 后运行官方 `registry:2.8.3` garbage-collect pod。 - finally 阶段删除 GC pod、恢复 registry replicas、等待 rollout、恢复 CronJob suspend 状态。 - 状态查询使用 `gc remote G14 status --job-id `,不使用长 SSH 会话等待。 @@ -93,7 +97,7 @@ G14 当前只有一个本机 k3s cluster;空间归因时不要把 `hwlab-dev` |---|---|---| | Host path | `du -x` 按 `/var/lib/*`、`/root`、`/usr` 等目录统计 | 判断根盘主要压力源 | | k3s namespace/PVC | `kubectl get pv,pvc,pod -A -o json` 结合 local-path PV host path | 把可归属数据映射到 namespace/workload | -| Registry repo/tag | registry v2 repository/tag link 与 blob manifest 解析 | 判断镜像历史 tag 与共享 layer 的贡献 | +| Registry repo/tag/revision | registry v2 repository/tag link、manifest revision 与 blob manifest 解析 | 判断镜像历史 tag、cache revision 与共享 layer 的贡献 | 当前 G14 高水位的长期基线分布如下,后续诊断出现同类量级时优先按同一顺序处理: @@ -140,7 +144,7 @@ bun scripts/cli.ts ssh G14 script -- 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubec bun scripts/cli.ts ssh G14 script -- 'find /var/lib/hwlab/registry/docker/registry/v2/repositories -path "*/_manifests/tags/*/current/link" -type f | wc -l' ``` -需要深挖 registry 时,报告字段至少包括 repo、tag count、latest tags、unique blob bytes 和 shared blob bytes。需要深挖 k3s runtime 时,报告字段至少包括 namespace/PVC、PV host path、owner workload、PVC 实占、k3s containerd snapshots/blobs 总量。不要把 `/var/lib/kubelet/pods` 与 `/var/lib/rancher/k3s/storage` 简单相加,因为 kubelet pod 目录可能包含 PVC bind mount 或 runtime 元数据,存在重复计数风险。 +需要深挖 registry 时,报告字段至少包括 repo、tag count、manifest revision count、latest tags、protected digest closure、unique blob bytes 和 shared blob bytes。需要深挖 k3s runtime 时,报告字段至少包括 namespace/PVC、PV host path、owner workload、PVC 实占、k3s containerd snapshots/blobs 总量。不要把 `/var/lib/kubelet/pods` 与 `/var/lib/rancher/k3s/storage` 简单相加,因为 kubelet pod 目录可能包含 PVC bind mount 或 runtime 元数据,存在重复计数风险。 ## Validation Checklist diff --git a/scripts/src/gc-remote.ts b/scripts/src/gc-remote.ts index f289ad71..ddfbe292 100644 --- a/scripts/src/gc-remote.ts +++ b/scripts/src/gc-remote.ts @@ -104,11 +104,10 @@ function parseRemoteGcOptions(args: string[]): RemoteGcOptions { options.coreDumpMinAgeHours = parseNonNegativeNumber(arg, args[++index]); } else if (arg === "--registry-keep-per-repo") { const value = parseNonNegativeNumber(arg, args[++index]); - if (!Number.isInteger(value) || value < 10) throw new Error("--registry-keep-per-repo must be an integer >= 10"); + if (!Number.isInteger(value) || value < 1) throw new Error("--registry-keep-per-repo must be an integer >= 1"); options.registryKeepPerRepo = Math.min(value, 50); } else if (arg === "--registry-min-age-hours") { const value = parseNonNegativeNumber(arg, args[++index]); - if (value < 24) throw new Error("--registry-min-age-hours must be >= 24"); options.registryMinAgeHours = value; } else if (arg === "--job-id") { const value = args[++index]; @@ -613,12 +612,131 @@ def registry_tag_rows(): }) return rows +def registry_revision_rows(): + rows = [] + root = REGISTRY_REPOSITORY_ROOT + if not os.path.isdir(root): + return rows + for repo_root, dirs, files in os.walk(root): + if os.path.basename(repo_root) != "sha256": + continue + rel = os.path.relpath(repo_root, root) + suffix = "/_manifests/revisions/sha256" + if not rel.endswith(suffix): + continue + repo = rel[:-len(suffix)] + try: + revisions = os.listdir(repo_root) + except OSError: + continue + for digest_hex in sorted(revisions): + path = os.path.join(repo_root, digest_hex) + link = os.path.join(path, "link") + if not os.path.isfile(link): + continue + try: + with open(link, "r", encoding="utf-8") as handle: + digest = handle.read().strip() + stat = os.stat(link) + except OSError: + continue + rows.append({ + "repo": repo, + "digest": digest, + "mtime": stat.st_mtime, + "mtimeIso": time.strftime("%Y-%m-%dT%H:%M:%SZ", time.gmtime(stat.st_mtime)), + "path": path, + }) + return rows + +def registry_retention_repo(repo): + return repo.startswith("hwlab/hwlab-") or repo.startswith("hwlab/cache/hwlab-") + +def registry_digest_hex(digest): + if not isinstance(digest, str) or not digest.startswith("sha256:"): + return None + value = digest.split(":", 1)[1] + if re.match(r"^[0-9a-f]{64}$", value) is None: + return None + return value + +def registry_blob_data_path(digest): + value = registry_digest_hex(digest) + if value is None: + return None + return os.path.join(REGISTRY_ROOT, "docker/registry/v2/blobs/sha256", value[:2], value, "data") + +_manifest_cache = {} +def registry_manifest_json(digest): + if digest in _manifest_cache: + return _manifest_cache[digest] + path = registry_blob_data_path(digest) + if path is None or not os.path.isfile(path): + _manifest_cache[digest] = None + return None + try: + with open(path, "rb") as handle: + data = handle.read(8 * 1024 * 1024) + value = json.loads(data.decode("utf-8")) + except Exception: + value = None + _manifest_cache[digest] = value + return value + +def registry_manifest_refs(digest): + manifest = registry_manifest_json(digest) + if not isinstance(manifest, dict): + return set() + refs = set() + config = manifest.get("config") or {} + config_digest = config.get("digest") + if isinstance(config_digest, str) and registry_digest_hex(config_digest) is not None: + refs.add(config_digest) + for item in manifest.get("layers") or []: + item_digest = (item or {}).get("digest") + if isinstance(item_digest, str) and registry_digest_hex(item_digest) is not None: + refs.add(item_digest) + for item in manifest.get("manifests") or []: + item_digest = (item or {}).get("digest") + if isinstance(item_digest, str) and registry_digest_hex(item_digest) is not None: + refs.add(item_digest) + return refs + +def registry_digest_closure(seed): + seen = set() + stack = list(seed) + while stack: + digest = stack.pop() + if digest in seen or registry_digest_hex(digest) is None: + continue + seen.add(digest) + for child in registry_manifest_refs(digest): + if child not in seen: + stack.append(child) + return seen + +def registry_blob_size(digest): + path = registry_blob_data_path(digest) + if path is None or not os.path.isfile(path): + return 0 + try: + return int(os.lstat(path).st_blocks) * 512 + except OSError: + return 0 + +def estimate_registry_reclaim(delete_manifest_digests, kept_manifest_digests): + deleted = registry_digest_closure(delete_manifest_digests) + kept = registry_digest_closure(kept_manifest_digests) + reclaim = deleted - kept + return sum(registry_blob_size(digest) for digest in reclaim) + def plan_registry_retention(): - keep_per_repo = int(OPTIONS.get("registryKeepPerRepo") or 5) - min_age_hours = float(OPTIONS.get("registryMinAgeHours") or 48) + keep_per_repo = int(OPTIONS.get("registryKeepPerRepo") if OPTIONS.get("registryKeepPerRepo") is not None else 5) + min_age_hours = float(OPTIONS.get("registryMinAgeHours") if OPTIONS.get("registryMinAgeHours") is not None else 48) cutoff = time.time() - min_age_hours * 3600 refs, digests, refs_command = workload_image_refs() rows = registry_tag_rows() + revision_rows = registry_revision_rows() by_repo = {} for row in rows: by_repo.setdefault(row["repo"], []).append(row) @@ -666,30 +784,50 @@ def plan_registry_retention(): kept_count += 1 kept_digests.add(row["digest"]) keep_by_repo[row["repo"]] = keep_by_repo.get(row["repo"], 0) + 1 + protected_digests = kept_digests | digests + protected_digests.update(row["digest"] for row in revision_rows if not registry_retention_repo(row["repo"])) + protected_digests = registry_digest_closure(protected_digests) + delete_revision_rows = [] + revision_delete_by_repo = {} + for row in revision_rows: + if not registry_retention_repo(row["repo"]): + continue + if row["digest"] in protected_digests: + continue + delete_revision_rows.append(row) + revision_delete_by_repo[row["repo"]] = revision_delete_by_repo.get(row["repo"], 0) + 1 + kept_revision_digests = set(row["digest"] for row in revision_rows if row not in delete_revision_rows) + delete_revision_digests = set(row["digest"] for row in delete_revision_rows) deletable_manifests = {} for row in delete_rows: if row["digest"] in kept_digests: continue deletable_manifests.setdefault(row["repo"], set()).add(row["digest"]) + for row in delete_revision_rows: + deletable_manifests.setdefault(row["repo"], set()).add(row["digest"]) deletable_manifest_count = sum(len(items) for items in deletable_manifests.values()) - estimate = 0 registry_size = du_size(REGISTRY_ROOT, 30) or 0 - if rows: - estimate = int(registry_size * min(0.75, deletable_manifest_count / float(len(rows)) * 0.55)) + estimate = estimate_registry_reclaim(delete_revision_digests, kept_revision_digests) return { "tagRows": rows, + "revisionRows": revision_rows, "deleteRows": delete_rows, + "deleteRevisionRows": delete_revision_rows, "summary": { "totalTags": len(rows), + "totalRevisions": len(revision_rows), "repoCount": len(by_repo), "keepPerRepo": keep_per_repo, "minAgeHours": min_age_hours, "protectedWorkloadRefs": len(refs), "protectedDigestRefs": len(digests), + "protectedDigestClosure": len(protected_digests), "keptTags": kept_count, "deleteTags": len(delete_rows), "deleteManifests": deletable_manifest_count, + "deleteRevisions": len(delete_revision_rows), "deleteByRepo": delete_by_repo, + "revisionDeleteByRepo": revision_delete_by_repo, "keepByRepo": keep_by_repo, "registrySizeBytes": registry_size, "estimatedReclaimBytes": estimate, @@ -779,11 +917,12 @@ def execute_registry_retention(): raise RuntimeError("refusing registry maintenance while hwlab-ci PipelineRun/TaskRun is active") plan = plan_registry_retention() delete_rows = plan.get("deleteRows") or [] + delete_revision_rows = plan.get("deleteRevisionRows") or [] delete_manifests = plan.get("deleteManifestsByRepo") or {} - if not delete_rows: - return {"reclaimedBytes": 0, "commandOutput": {"message": "no registry tags matched conservative retention", "registryPlan": plan.get("summary")}} + if not delete_rows and not delete_revision_rows: + return {"reclaimedBytes": 0, "commandOutput": {"message": "no registry tags or revisions matched conservative retention", "registryPlan": plan.get("summary")}} if not delete_manifests: - return {"reclaimedBytes": 0, "commandOutput": {"message": "matched tags are still referenced by retained tags; registry GC would not reclaim blobs", "registryPlan": plan.get("summary")}} + return {"reclaimedBytes": 0, "commandOutput": {"message": "matched manifests are still referenced by retained manifests; registry GC would not reclaim blobs", "registryPlan": plan.get("summary")}} cronjobs = ["hwlab-g14-branch-poller", "hwlab-v02-branch-poller"] original_crons = cronjob_suspend_states(cronjobs) before = du_size(REGISTRY_ROOT, 60) or 0 @@ -835,6 +974,21 @@ def execute_registry_retention(): deleted.append({"repo": row.get("repo"), "tag": row.get("tag"), "digest": row.get("digest")}) steps.append({"step": "delete-tag-directories", "count": len(deleted)}) + deleted_revisions = [] + for row in delete_revision_rows: + path = os.path.abspath(str(row.get("path") or "")) + digest_hex = registry_digest_hex(str(row.get("digest") or "")) + if digest_hex is None: + raise RuntimeError("refusing unexpected registry revision digest: %s" % row.get("digest")) + if not path.startswith(REGISTRY_REPOSITORY_ROOT + "/") or "/_manifests/revisions/sha256/" not in path: + raise RuntimeError("refusing unexpected registry revision path: %s" % path) + if os.path.basename(path) != digest_hex: + raise RuntimeError("refusing registry revision path/digest mismatch: %s" % path) + if os.path.isdir(path) and not os.path.islink(path): + shutil.rmtree(path) + deleted_revisions.append({"repo": row.get("repo"), "digest": row.get("digest")}) + steps.append({"step": "delete-revision-directories", "count": len(deleted_revisions)}) + overrides = { "apiVersion": "v1", "spec": { @@ -874,6 +1028,7 @@ def execute_registry_retention(): "commandOutput": { "registryPlan": plan.get("summary"), "deletedTagCount": len(delete_rows), + "deletedRevisionCount": len(delete_revision_rows), "deletedManifestCount": sum(len(items) for items in delete_manifests.values()), "diskBeforeBytes": before, "diskAfterBytes": after, @@ -1199,13 +1354,14 @@ def collect_candidates(observed_at): registry = plan_registry_retention() summary = registry.get("summary") or {} delete_rows = registry.get("deleteRows") or [] + delete_revision_rows = registry.get("deleteRevisionRows") or [] estimate = int(summary.get("estimatedReclaimBytes") or 0) - if delete_rows: + if delete_rows or delete_revision_rows: candidates.append({ "id": "hwlab-registry:retention-gc", "kind": "hwlab-registry-retention-gc", "risk": "medium", - "description": "Conservative HWLAB registry retention: keep current workload refs, recent tags and latest tags per repo, then run official registry garbage-collect", + "description": "Conservative HWLAB registry retention: keep current workload refs, retained tags and protected repos, delete stale manifest revisions, then run official registry garbage-collect", "path": REGISTRY_ROOT, "sizeBytes": int(summary.get("registrySizeBytes") or 0), "estimatedReclaimBytes": estimate, @@ -1216,9 +1372,12 @@ def collect_candidates(observed_at): "minAgeHours": summary.get("minAgeHours"), "deleteTags": len(delete_rows), "deleteManifests": summary.get("deleteManifests"), + "deleteRevisions": summary.get("deleteRevisions"), "deleteByRepo": summary.get("deleteByRepo"), + "revisionDeleteByRepo": summary.get("revisionDeleteByRepo"), "protectedWorkloadRefs": summary.get("protectedWorkloadRefs"), "protectedDigestRefs": summary.get("protectedDigestRefs"), + "protectedDigestClosure": summary.get("protectedDigestClosure"), }, }) elif bool(OPTIONS.get("registryGcOnly")) and int(summary.get("totalTags") or 0) > 0 and int(summary.get("deleteTags") or 0) == 0: diff --git a/scripts/src/help.ts b/scripts/src/help.ts index 98bd58f3..6f772119 100644 --- a/scripts/src/help.ts +++ b/scripts/src/help.ts @@ -299,10 +299,10 @@ function gcHelp(): unknown { "--tmp-min-age-hours N": "delete allowlisted /tmp artifacts older than N hours; default 24", "--core-dump-min-age-hours N": "remote only: delete untracked allowlisted core. dumps older than N hours; default 1", "--no-core-dumps": "remote only: do not include scoped core dump cleanup candidates", - "--include-hwlab-registry": "remote G14 only: opt in to conservative HWLAB registry tag retention plus official registry garbage-collect", + "--include-hwlab-registry": "remote G14 only: opt in to conservative HWLAB registry tag and stale manifest-revision retention plus official registry garbage-collect", "--registry-gc-only": "remote G14 only: run official registry garbage-collect without deleting additional tags; intended for interrupted registry retention recovery", - "--registry-keep-per-repo N": "remote registry only: keep at least N newest tags per service repo; default 20, minimum 10", - "--registry-min-age-hours N": "remote registry only: keep all tags newer than N hours; default 48, minimum 24", + "--registry-keep-per-repo N": "remote registry only: keep at least N newest tags per service repo; default 20, minimum 1", + "--registry-min-age-hours N": "remote registry only: keep all tags newer than N hours; default 48, minimum 0", "--job-id ID": "remote status only: inspect a long-running remote gc job", "--limit N": "number of candidates returned and executed by run when --full is not set; default 50", "--result-limit N": "number of per-candidate run results returned when --full is not set; default 50", @@ -311,7 +311,7 @@ function gcHelp(): unknown { "db-trace --before-date YYYY-MM-DD": "plan or delete default trace telemetry event types before the date", "db-trace run --vacuum-full": "rewrite public.oa_events after deletion so df can reclaim disk; requires maintenance window", "policy plan|install": "render or install journald caps and a daily file-log plus allowlisted /tmp low-risk gc systemd timer", - "remote plan|run": "run bounded GC through UniDesk SSH passthrough on a provider host; G14 protects HWLAB k3s/runtime/PVC/workspace paths, and HWLAB registry retention is explicit opt-in with workload-ref, recent-tag and per-repo tag protection", + "remote plan|run": "run bounded GC through UniDesk SSH passthrough on a provider host; G14 protects HWLAB k3s/runtime/PVC/workspace paths, and HWLAB registry retention is explicit opt-in with workload-ref, digest-closure, recent-tag and per-repo tag protection", "--no-file-logs|--no-docker-logs|--no-journal|--no-build-cache|--no-tmp|--no-db-summary": "disable one collector", }, reference: "docs/reference/gc.md", diff --git a/scripts/src/ssh.ts b/scripts/src/ssh.ts index 5493bbca..67c978d5 100644 --- a/scripts/src/ssh.ts +++ b/scripts/src/ssh.ts @@ -90,6 +90,7 @@ const defaultSshRuntimeTimeoutMs = 60_000; const maxSshRuntimeTimeoutMs = 60_000; export const sshShellCompatibilityPrelude = 'printf(){ if [ "${1+x}" = x ] && [ "$1" = "-v" ] && [ -n "${BASH_VERSION:-}" ]; then command printf "$@"; return $?; fi; if [ "${1+x}" = x ] && [ "$1" = "--" ]; then shift; fi; command printf -- "$@"; }'; const k3sResourceKindAliases = new Set(["pod", "po", "pods", "deployment", "deploy", "deployments", "statefulset", "sts", "daemonset", "ds", "job", "jobs"]); +const k3sPodRoutePrefixes = ["pod:", "po:", "pods:"]; const legacyK3sOperationRouteSegments = new Set([ "guard", "kubectl", @@ -944,7 +945,8 @@ export function parseSshRoute(target: string): ParsedSshRoute { return hostSshRoute(providerId, target, workspace); } if (plane !== "k3s") throw new Error(`unsupported ssh route plane: ${plane}`); - const [first, second, third, fourth] = rest; + const normalizedRest = normalizeK3sRouteRestSegments(rest); + const [first, second, third, fourth] = normalizedRest; const operationInRoute = [first, second, third].map((segment) => segment === undefined ? undefined : routeSegmentHead(segment)).find((segment) => segment !== undefined && legacyK3sOperationRouteSegments.has(segment)); if (operationInRoute !== undefined) throw new Error(k3sOperationInRouteMessage(target, operationInRoute)); if (fourth !== undefined) throw new Error("ssh k3s target route supports at most provider:k3s:namespace:resource:container"); @@ -1155,6 +1157,29 @@ function parseK3sRouteTargetSegments(rawResource: string | null, rawContainer: s }; } +function normalizeK3sRouteRestSegments(rest: string[]): string[] { + const normalized: string[] = []; + for (let index = 0; index < rest.length; index += 1) { + const current = rest[index] ?? ""; + const podPrefix = k3sPodRoutePrefixes.find((prefix) => current === prefix.slice(0, -1) || current.startsWith(prefix)); + if (podPrefix === undefined) { + normalized.push(current); + continue; + } + if (current === podPrefix.slice(0, -1)) { + const next = rest[index + 1]; + if (next === undefined || next.length === 0) throw new Error("ssh k3s pod: route requires a pod name after pod:"); + normalized.push(`pod/${next}`); + index += 1; + continue; + } + const podName = current.slice(podPrefix.length); + if (podName.length === 0) throw new Error("ssh k3s pod: route requires a pod name after pod:"); + normalized.push(`pod/${podName}`); + } + return normalized; +} + function splitK3sResourceWorkspace(value: string | null): { resource: string | null; workspace: string | null } { if (value === null || value.length === 0) return { resource: null, workspace: null }; const parts = value.split("/"); @@ -1306,7 +1331,7 @@ function parseK3sRouteArgs(route: ParsedSshRoute, args: string[]): ParsedSshArgs return parseK3sControlPlaneOperation(route, args); } if (route.namespace === null || route.resource === null) { - throw new Error("ssh k3s target route requires provider:k3s::"); + throw new Error("ssh k3s target route requires provider:k3s::"); } return parseK3sTargetOperation(route, args); } @@ -1762,6 +1787,9 @@ function normalizeK3sResource(value: string): string { function normalizeK3sRouteResource(value: string): string { if (value.startsWith("deploy/")) return `deployment/${value.slice("deploy/".length)}`; if (value.startsWith("po/")) return `pod/${value.slice("po/".length)}`; + if (value.startsWith("pod:")) return `pod/${value.slice("pod:".length)}`; + if (value.startsWith("po:")) return `pod/${value.slice("po:".length)}`; + if (value.startsWith("pods:")) return `pod/${value.slice("pods:".length)}`; if (value.includes("/")) return value; return `deployment/${value}`; } diff --git a/scripts/ssh-argv-guidance-contract-test.ts b/scripts/ssh-argv-guidance-contract-test.ts index a05ebd47..c395de31 100644 --- a/scripts/ssh-argv-guidance-contract-test.ts +++ b/scripts/ssh-argv-guidance-contract-test.ts @@ -996,13 +996,16 @@ export async function runSshArgvGuidanceContract(): Promise { "pod apply-patch must be an operation after the route", ); - const routePodTarget = parseSshInvocation("D601:k3s:hwlab-dev:pod/hwlab-cloud-api-abc:api", ["printenv", "HOSTNAME"]); + const routePodTarget = parseSshInvocation("D601:k3s:hwlab-dev:pod:hwlab-cloud-api-abc:api", ["printenv", "HOSTNAME"]); assertCondition(routePodTarget.parsed.remoteCommand === "'env' 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml' 'kubectl' 'exec' '-n' 'hwlab-dev' 'pod/hwlab-cloud-api-abc' '-c' 'api' '--' 'printenv' 'HOSTNAME'", "pod route with container must preserve explicit pod kind", routePodTarget); - const routePodWorkspace = parseSshInvocation("D601:k3s:hwlab-dev:pod/hwlab-cloud-api-abc/workspace/app:api", ["pwd"]); + const routePodWorkspace = parseSshInvocation("D601:k3s:hwlab-dev:pod:hwlab-cloud-api-abc/workspace/app:api", ["pwd"]); assertCondition(routePodWorkspace.route.resource === "pod/hwlab-cloud-api-abc" && routePodWorkspace.route.container === "api" && routePodWorkspace.route.workspace === "/workspace/app", "pod route must support a workspace suffix after the pod id", routePodWorkspace); assertCondition(routePodWorkspace.parsed.remoteCommand === "'env' 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml' 'kubectl' 'exec' '-n' 'hwlab-dev' 'pod/hwlab-cloud-api-abc' '-c' 'api' '--' 'sh' '-c' 'cd \"$1\" || exit; shift; exec \"$@\"' 'unidesk-cwd' '/workspace/app' 'pwd'", "pod workspace route must run commands through a fixed cwd wrapper", routePodWorkspace); + const legacyRoutePodTarget = parseSshInvocation("D601:k3s:hwlab-dev:pod/hwlab-cloud-api-abc:api", ["printenv", "HOSTNAME"]); + assertCondition(legacyRoutePodTarget.parsed.remoteCommand === routePodTarget.parsed.remoteCommand, "legacy pod/name route remains accepted for compatibility", legacyRoutePodTarget); + const routeDeploymentWorkspaceScript = parseSshInvocation("D601:k3s:hwlab-dev:hwlab-cloud-api/app", ["script"]); assertCondition(routeDeploymentWorkspaceScript.route.resource === "hwlab-cloud-api" && routeDeploymentWorkspaceScript.route.workspace === "/app", "deployment shorthand route must support workspace suffix", routeDeploymentWorkspaceScript); assertCondition(routeDeploymentWorkspaceScript.parsed.remoteCommand === "'env' 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml' 'kubectl' 'exec' '-i' '-n' 'hwlab-dev' 'deployment/hwlab-cloud-api' '--' 'sh' '-c' 'cd \"$1\" || exit; shift; exec \"$@\"' 'unidesk-cwd' '/app' 'sh' '-s' '--'", "pod workspace script must set cwd before shell -s consumes stdin", routeDeploymentWorkspaceScript);