From 463132622eae8ffd069f02c107ae960d1bfbda95 Mon Sep 17 00:00:00 2001 From: Artificer Date: Thu, 11 Jun 2026 19:08:19 +0000 Subject: [PATCH] feat: add state artifact gc retention --- .agents/skills/unidesk-ops/SKILL.md | 4 +- docs/reference/gc.md | 24 +++- scripts/src/gc.ts | 212 +++++++++++++++++++++++++++- scripts/src/help.ts | 11 +- 4 files changed, 241 insertions(+), 10 deletions(-) diff --git a/.agents/skills/unidesk-ops/SKILL.md b/.agents/skills/unidesk-ops/SKILL.md index 6c2b061f..eacf9e38 100644 --- a/.agents/skills/unidesk-ops/SKILL.md +++ b/.agents/skills/unidesk-ops/SKILL.md @@ -90,9 +90,9 @@ bun scripts/cli.ts gc plan --target-use-percent 69 \ --include-vpn-diagnostic-logs ``` -`--target-use-percent` 按 `df` 显示口径估算 shortfall。工具缓存、`/tmp` 非 allowlist 直接子项、VS Code 历史 server/extension 版本、Baidu staging 旧 PGDATA tarball、VPN 诊断 ring pcap 均默认不启用;必须显式 include 后才进入候选,且执行时仍受路径断言保护。stale `/tmp` 扫描按 `--limit` 有界枚举候选,避免为了估算全量临时目录而长时间无输出。VPN 诊断日志只选择 `/root/vpn-server/logs/hy2-udp-ring-*.pcap` 和 `hy2-monitor-ring-*.pcap` 中超过 `--vpn-diagnostic-log-keep-hours` 的普通文件,执行前检查 active fd;不删除 evidence JSONL。默认 GC 不触碰 PGDATA、Docker volumes/images、Codex sessions/auth state、Baidu staging 根目录或 VPN 日志根目录。 +`--target-use-percent` 按 `df` 显示口径估算 shortfall。工具缓存、`/tmp` 非 allowlist 直接子项、VS Code 历史 server/extension 版本、Baidu staging 旧 PGDATA tarball、UniDesk `.state` 历史诊断/部署产物、VPN 诊断 ring pcap 均默认不启用;必须显式 include 后才进入候选,且执行时仍受路径断言保护。stale `/tmp` 扫描按 `--limit` 有界枚举候选,避免为了估算全量临时目录而长时间无输出。`.state` retention 只通过 `--include-state-artifacts --state-artifact-keep-days N` 选择 `.state/e2e`、`.state/validation`、`.state/jobs`、`.state/codex-queue/output-archive` 下超过保留期的普通文件,以及 `.state/deploy/exports`、`.state/deploy/resolve` 下超过保留期的直接子目录;默认保留期 14 天。VPN 诊断日志只选择 `/root/vpn-server/logs/hy2-udp-ring-*.pcap` 和 `hy2-monitor-ring-*.pcap` 中超过 `--vpn-diagnostic-log-keep-hours` 的普通文件,执行前检查 active fd;不删除 evidence JSONL。默认 GC 不触碰 `.state/recovery`、`.state/codex-queue/codex-home`、`.state/deploy/work`、`.state/baidu-netdisk`、PGDATA、Docker volumes/images、Codex sessions/auth state、active worktree、runtime image/snapshot state、Baidu staging 根目录或 VPN 日志根目录。 -`gc policy install` 的每日 timer 会自动执行 24 小时 VPN 诊断 pcap retention,用于限制长期 tcpdump ring 文件增长;手动 `gc plan/run` 仍必须显式 `--include-vpn-diagnostic-logs` 才会列出或删除这些 pcap。 +`gc policy install` 的每日 timer 会自动执行 24 小时 VPN 诊断 pcap retention 和 14 天 UniDesk `.state` artifact retention,用于限制长期诊断/部署产物增长;手动 `gc plan/run` 仍必须显式 `--include-vpn-diagnostic-logs` 或 `--include-state-artifacts` 才会列出或删除这些对象。 --- diff --git a/docs/reference/gc.md b/docs/reference/gc.md index fbf5617b..fb74295e 100644 --- a/docs/reference/gc.md +++ b/docs/reference/gc.md @@ -6,7 +6,7 @@ UniDesk 的磁盘治理入口是 `bun scripts/cli.ts gc ...`。该入口用于 - `gc plan`:只读生成主 server 清理候选、估算收益、风险等级、保护对象和数据库诊断摘要。 - `gc run --confirm`:只执行当前 plan 可见候选页,默认不执行分页隐藏候选;用 `--limit`、`--result-limit`、`--full|--raw` 控制披露和执行范围。 -- `gc policy plan|install`:渲染或安装低风险长期策略,例如 journald cap、每日 allowlisted 文件/tmp 清理 timer 和 24 小时 VPN 诊断 pcap retention。 +- `gc policy plan|install`:渲染或安装低风险长期策略,例如 journald cap、每日 allowlisted 文件/tmp 清理 timer、24 小时 VPN 诊断 pcap retention 和 14 天 `.state` artifact retention。 - `gc db-trace plan|run --confirm --before-date YYYY-MM-DD --vacuum-full`:显式 trace 遥测留存入口;涉及数据库重写时按维护窗口处理。 - `gc remote plan|run --confirm|status --job-id `:通过 UniDesk SSH 透传在 provider host 上执行受控 GC。远端长任务必须使用异步 job 和 `status` 短查询,不应让单次 SSH 等待完整 registry GC 或其他长清理。 @@ -16,6 +16,19 @@ UniDesk 的磁盘治理入口是 `bun scripts/cli.ts gc ...`。该入口用于 主 server VPN 诊断日志默认不清理。`/root/vpn-server/logs` 中由长期 `tcpdump -G` 产生的 `hy2-udp-ring-*.pcap` 和 `hy2-monitor-ring-*.pcap` 可通过显式 `--include-vpn-diagnostic-logs` 进入候选,默认只选择超过 `--vpn-diagnostic-log-keep-hours 24` 的普通 pcap 文件。执行前必须重新校验路径、文件名、非 symlink/regular file,并用 active-file 检查确认没有进程仍打开该文件。`hy2-server-evidence.jsonl`、stdout/stderr log、最新 pcap 和整个日志根目录始终作为 protected 输出,不得被这个入口删除或截断。 +主 server `.state` 历史诊断和部署产物默认不进入手动 GC。需要清理时必须显式传入 `--include-state-artifacts`,保留期通过 `--state-artifact-keep-days N` 设置,默认 14 天且必须是正整数。该入口只选择以下对象: + +| 范围 | 候选 | +|---|---| +| `.state/e2e` | 超过保留期的普通文件 | +| `.state/validation` | 超过保留期的普通文件 | +| `.state/jobs` | 超过保留期的普通文件 | +| `.state/codex-queue/output-archive` | 超过保留期的普通文件 | +| `.state/deploy/exports` | 超过保留期的直接子目录 | +| `.state/deploy/resolve` | 超过保留期的直接子目录 | + +`gc run --confirm --include-state-artifacts` 执行前必须重新校验路径、保留期、对象类型和 symlink 状态。文件候选必须仍是 allowlist 根下的普通文件;deploy 目录候选必须仍是 `.state/deploy/exports` 或 `.state/deploy/resolve` 的直接子目录。该入口不得递归扩大成通用 `.state` 清空器,也不得选择 `.state` 根目录、allowlist 之外的目录、symlink、active worktree、runtime image 或 snapshot 状态。 + ## Protected Data 默认 GC 不得删除或 prune 以下对象: @@ -24,7 +37,12 @@ UniDesk 的磁盘治理入口是 `bun scripts/cli.ts gc ...`。该入口用于 |---|---| | PostgreSQL PGDATA | 数据库权威状态,必须走备份、留存或迁移流程 | | Docker image/container/volume | 运行面和发布真相可能依赖旧镜像或 volume | -| Baidu Netdisk staging/backups | 备份链路状态和可重建缓存边界需单独判定 | +| `.state/recovery` | 恢复状态和人工回滚线索,不属于 artifact retention | +| `.state/codex-queue/codex-home` | Codex sessions/auth/profile 状态,不得作为队列输出归档清理 | +| `.state/deploy/work` | 部署工作目录可能包含 active rollout 上下文 | +| `.state/baidu-netdisk` | Baidu Netdisk token、任务、备份和 staging 状态需单独判定 | +| active worktree、runtime image、runtime snapshot state | 当前执行面和运行面 provenance,不通过 `.state` artifact retention 删除 | +| Codex sessions/auth | `~/.codex/sessions`、`~/.codex/auth.json` 等凭证和会话状态 | | VPN diagnostic evidence logs | `/root/vpn-server/logs/hy2-server-evidence.jsonl` 等 active evidence 流用于网络排障,不随 pcap retention 删除 | | D601 registry storage | artifact registry retention 需使用专门入口 | | `/var/lib/rancher/k3s` 与 `/var/lib/rancher/k3s/storage` | k3s 控制面、containerd 状态和 local-path PVC 数据 | @@ -34,6 +52,8 @@ UniDesk 的磁盘治理入口是 `bun scripts/cli.ts gc ...`。该入口用于 如果需要触碰上表对象,必须先补高层 UniDesk CLI 子命令、dry-run 计划、保护对象、验证命令和失败分类;不能把原生 `kubectl`、`docker prune`、`crictl rmi` 或手写 registry shell 作为长期流程。 +`gc policy install` 的每日 timer 会启用 14 天 `.state` artifact retention,用来限制历史诊断和部署产物长期增长;手动 `gc plan/run` 仍默认不清 `.state`,必须显式 `--include-state-artifacts` 才会列出或执行这些候选。policy timer 仍保护上表对象,并把输出限制在 `.state/gc/last-run.json` 和 `.state/gc/last-run.stderr`。 + ## Remote G14 Policy `gc remote G14 ...` 必须先确认目标是 G14 原生 k3s 节点,且 preflight 中节点名包含 `ubuntu-rog-zephyrus-g14-ga401iv-ga401iv`。G14 默认候选只允许: diff --git a/scripts/src/gc.ts b/scripts/src/gc.ts index b5d0f19e..42032961 100644 --- a/scripts/src/gc.ts +++ b/scripts/src/gc.ts @@ -19,6 +19,8 @@ type GcItemKind = | "vscode-server-delete" | "vscode-extension-delete" | "baidu-staging-file-delete" + | "state-artifact-file-delete" + | "state-artifact-dir-delete" | "vpn-diagnostic-pcap-delete"; interface GcOptions { @@ -44,6 +46,8 @@ interface GcOptions { vscodeKeepExtensionVersions: number; baiduStaging: boolean; baiduStagingKeepDays: number; + stateArtifacts: boolean; + stateArtifactKeepDays: number; vpnDiagnosticLogs: boolean; vpnDiagnosticLogKeepHours: number; dbSummary: boolean; @@ -183,6 +187,8 @@ const DEFAULT_OPTIONS: GcOptions = { vscodeKeepExtensionVersions: 1, baiduStaging: false, baiduStagingKeepDays: 10, + stateArtifacts: false, + stateArtifactKeepDays: 14, vpnDiagnosticLogs: false, vpnDiagnosticLogKeepHours: 24, dbSummary: true, @@ -300,6 +306,23 @@ const TOOL_CACHE_ALLOWLIST = [ const VSCODE_SERVER_ROOT = "/root/.vscode-server/cli/servers"; const VSCODE_EXTENSION_ROOT = "/root/.vscode-server/extensions"; const BAIDU_STAGING_RELATIVE_ROOT = [".state", "baidu-netdisk", "staging"]; +const STATE_ARTIFACT_FILE_ROOTS = [ + { id: "e2e", relativeRoot: [".state", "e2e"] }, + { id: "validation", relativeRoot: [".state", "validation"] }, + { id: "jobs", relativeRoot: [".state", "jobs"] }, + { id: "codex-queue-output-archive", relativeRoot: [".state", "codex-queue", "output-archive"] }, +] as const; +const STATE_ARTIFACT_DIR_ROOTS = [ + { id: "deploy-exports", relativeRoot: [".state", "deploy", "exports"] }, + { id: "deploy-resolve", relativeRoot: [".state", "deploy", "resolve"] }, +] as const; +const PROTECTED_STATE_PATHS = [ + { kind: "state-recovery", relativePath: [".state", "recovery"], reason: ".state/recovery is recovery state and is never selected by state artifact retention." }, + { kind: "state-codex-home", relativePath: [".state", "codex-queue", "codex-home"], reason: "Codex home contains sessions/auth/runtime profile state and is never selected by state artifact retention." }, + { kind: "state-deploy-work", relativePath: [".state", "deploy", "work"], reason: "Deploy work state can contain active rollout context and is never selected by state artifact retention." }, + { kind: "baidu-netdisk-state", relativePath: [".state", "baidu-netdisk"], reason: "Baidu Netdisk state and backups are protected; only the separate Baidu staging tarball allowlist can select old PGDATA tarballs." }, + { kind: "runtime-snapshot", relativePath: [".state", "snapshots"], reason: "Runtime snapshots are protected unless a dedicated retention policy classifies them." }, +] as const; const VPN_DIAGNOSTIC_LOG_ROOT = "/root/vpn-server/logs"; const VPN_DIAGNOSTIC_RING_PCAP_PATTERN = /^hy2-(?:udp|monitor)-ring-\d{14}\.pcap$/u; const DEFAULT_PATH_SIZE_TIMEOUT_MS = 5_000; @@ -453,9 +476,13 @@ export function gcPlan(config: UniDeskConfig, options: GcOptions = DEFAULT_OPTIO if (options.baiduStaging) { candidates.push(...collectBaiduStagingCandidates(options, observedAt)); } + if (options.stateArtifacts) { + candidates.push(...collectStateArtifactCandidates(options, observedAt)); + } if (options.vpnDiagnosticLogs) { candidates.push(...collectVpnDiagnosticPcapCandidates(options, observedAt)); } + protectedItems.push(...collectProtectedStateArtifacts(options)); protectedItems.push(...collectProtectedVpnDiagnosticLogs(options)); protectedItems.push(...collectProtectedStorage(config, options)); @@ -483,16 +510,22 @@ export function gcPlan(config: UniDeskConfig, options: GcOptions = DEFAULT_OPTIO neverTouches: [ "Docker volumes", "PostgreSQL PGDATA", + ".state/recovery", + ".state/codex-queue/codex-home", + ".state/deploy/work", + ".state/baidu-netdisk", "Baidu Netdisk staging root by default", "D601 registry storage", "Docker images used by containers", "Codex sessions and auth state", + "active worktree/runtime image/snapshot state", ], notes: [ "gc run only executes listed one-time cleanup actions after --confirm.", options.full ? "Full candidate output requested." : `Default output is capped to ${options.limit} candidates; use --full or --limit N for broader disclosure.`, "Tool caches, stale /tmp direct children, stale VS Code server versions and stale VS Code extension versions are opt-in and require explicit include flags.", "Baidu Netdisk staging cleanup is opt-in and only selects old PGDATA backup tarballs under server-data/unidesk-pg-data.", + "State artifact retention is opt-in for manual plan/run; --include-state-artifacts selects only stale files under .state/e2e, .state/validation, .state/jobs and .state/codex-queue/output-archive plus stale direct directories under .state/deploy/exports and .state/deploy/resolve.", "VPN diagnostic pcap cleanup is opt-in and only selects stale hy2 ring pcap files; active pcap files and evidence JSONL are protected.", "Database event retention is diagnostic-only in this command; cleanups for oa_events require a backup and a separate schema/retention change.", "Docker image cleanup stays under server cleanup plan; gc does not run docker system prune or docker image prune.", @@ -609,6 +642,12 @@ function parseGcOptions(args: string[]): GcOptions { options.baiduStaging = false; } else if (arg === "--baidu-staging-keep-days") { options.baiduStagingKeepDays = parsePositiveIntegerOption(arg, args[++index], 3650); + } else if (arg === "--include-state-artifacts") { + options.stateArtifacts = true; + } else if (arg === "--no-state-artifacts") { + options.stateArtifacts = false; + } else if (arg === "--state-artifact-keep-days") { + options.stateArtifactKeepDays = parsePositiveIntegerOption(arg, args[++index], 3650); } else if (arg === "--include-vpn-diagnostic-logs") { options.vpnDiagnosticLogs = true; } else if (arg === "--no-vpn-diagnostic-logs") { @@ -757,6 +796,8 @@ function publicOptions(options: GcOptions): Record { vscodeKeepExtensionVersions: options.vscodeKeepExtensionVersions, baiduStaging: options.baiduStaging, baiduStagingKeepDays: options.baiduStagingKeepDays, + stateArtifacts: options.stateArtifacts, + stateArtifactKeepDays: options.stateArtifactKeepDays, vpnDiagnosticLogs: options.vpnDiagnosticLogs, vpnDiagnosticLogKeepHours: options.vpnDiagnosticLogKeepHours, dbSummary: options.dbSummary, @@ -1101,6 +1142,70 @@ function collectBaiduStagingCandidates(options: GcOptions, observedAt: string): return result.sort((left, right) => right.estimatedReclaimBytes - left.estimatedReclaimBytes); } +function collectStateArtifactCandidates(options: GcOptions, observedAt: string): GcCandidate[] { + return [ + ...collectStateArtifactFileCandidates(options, observedAt), + ...collectStateArtifactDirCandidates(options, observedAt), + ].sort((left, right) => right.estimatedReclaimBytes - left.estimatedReclaimBytes); +} + +function collectStateArtifactFileCandidates(options: GcOptions, observedAt: string): GcCandidate[] { + const cutoffMs = new Date(observedAt).getTime() - options.stateArtifactKeepDays * 24 * 60 * 60 * 1000; + const result: GcCandidate[] = []; + for (const rootInfo of STATE_ARTIFACT_FILE_ROOTS) { + const root = rootPath(...rootInfo.relativeRoot); + if (!isPlainDirectory(root)) continue; + for (const file of collectFiles(root)) { + if (file.mtimeMs >= cutoffMs || file.sizeBytes <= 0) continue; + const relativePath = file.path.slice(resolve(root).length + 1); + result.push({ + id: `state-artifact-file:${rootInfo.id}:${relativePath}`, + kind: "state-artifact-file-delete", + risk: "medium", + description: `Delete stale UniDesk .state artifact file older than ${options.stateArtifactKeepDays} days`, + path: file.path, + sizeBytes: file.sizeBytes, + estimatedReclaimBytes: file.sizeBytes, + action: { op: "unlink", allowlist: "state-artifact-file", root: rootInfo.relativeRoot.join("/"), keepDays: options.stateArtifactKeepDays }, + }); + } + } + return result; +} + +function collectStateArtifactDirCandidates(options: GcOptions, observedAt: string): GcCandidate[] { + const cutoffMs = new Date(observedAt).getTime() - options.stateArtifactKeepDays * 24 * 60 * 60 * 1000; + const result: GcCandidate[] = []; + for (const rootInfo of STATE_ARTIFACT_DIR_ROOTS) { + const root = rootPath(...rootInfo.relativeRoot); + if (!isPlainDirectory(root)) continue; + for (const entry of readdirSync(root, { withFileTypes: true })) { + if (!entry.isDirectory()) continue; + const path = join(root, entry.name); + let stat; + try { + stat = lstatSync(path); + } catch { + continue; + } + if (!stat.isDirectory() || stat.isSymbolicLink() || stat.mtimeMs >= cutoffMs) continue; + const sizeBytes = safePathSize(path); + if (sizeBytes <= 0) continue; + result.push({ + id: `state-artifact-dir:${rootInfo.id}:${entry.name}`, + kind: "state-artifact-dir-delete", + risk: "medium", + description: `Delete stale UniDesk .state deploy artifact directory older than ${options.stateArtifactKeepDays} days`, + path, + sizeBytes, + estimatedReclaimBytes: sizeBytes, + action: { op: "rm-recursive", allowlist: "state-artifact-direct-dir", root: rootInfo.relativeRoot.join("/"), keepDays: options.stateArtifactKeepDays }, + }); + } + } + return result; +} + function collectVpnDiagnosticPcapCandidates(options: GcOptions, observedAt: string): GcCandidate[] { if (!existsSync(VPN_DIAGNOSTIC_LOG_ROOT)) return []; const cutoffMs = new Date(observedAt).getTime() - options.vpnDiagnosticLogKeepHours * 60 * 60 * 1000; @@ -1129,6 +1234,39 @@ function collectVpnDiagnosticPcapCandidates(options: GcOptions, observedAt: stri return result.sort((left, right) => right.estimatedReclaimBytes - left.estimatedReclaimBytes); } +function collectProtectedStateArtifacts(options: GcOptions): ProtectedGcItem[] { + const result: ProtectedGcItem[] = []; + for (const rootInfo of [...STATE_ARTIFACT_FILE_ROOTS, ...STATE_ARTIFACT_DIR_ROOTS]) { + const ref = rootPath(...rootInfo.relativeRoot); + result.push(protectedPathItem( + options.stateArtifacts ? "state-artifact-root" : "state-artifact-retention-disabled", + ref, + options.stateArtifacts + ? "State artifact root is protected as a root; only stale allowlisted files or direct deploy artifact directories are candidates." + : "State artifact retention is disabled for manual gc by default; rerun with --include-state-artifacts to apply the bounded allowlist.", + )); + } + for (const item of PROTECTED_STATE_PATHS) { + result.push(protectedPathItem(item.kind, rootPath(...item.relativePath), item.reason)); + } + result.push( + protectedPathItem("codex-sessions", "/root/.codex/sessions", "Codex sessions are protected and are not selected by UniDesk gc."), + protectedPathItem("codex-auth", "/root/.codex/auth.json", "Codex auth state is protected and is not selected by UniDesk gc."), + protectedPathItem("active-worktree", repoRoot, "The active UniDesk worktree is protected; gc never deletes source worktrees as state artifacts."), + protectedPathItem("runtime-image", "docker-images-used-by-containers", "Runtime Docker images are protected; image cleanup stays under server cleanup plan and container image guards."), + ); + return result; +} + +function protectedPathItem(kind: string, ref: string, reason: string): ProtectedGcItem { + return { + kind, + risk: "blocked", + ref, + reason, + }; +} + function collectProtectedVpnDiagnosticLogs(options: GcOptions): ProtectedGcItem[] { const result: ProtectedGcItem[] = []; if (!existsSync(VPN_DIAGNOSTIC_LOG_ROOT)) return result; @@ -1348,7 +1486,7 @@ function gcPolicyPlan(options: GcPolicyOptions): unknown { policy: { safeScope: [ "systemd journal is capped at 512MiB", - "daily timer runs file-log, Docker json logs, 24h BuildKit cache, allowlisted /tmp gc and 24h VPN diagnostic pcap retention", + "daily timer runs file-log, Docker json logs, 24h BuildKit cache, allowlisted /tmp gc, 24h VPN diagnostic pcap retention and 14-day UniDesk .state artifact retention", "timer does not touch PostgreSQL PGDATA, Docker images, Docker volumes, tool caches, VS Code servers/extensions or Baidu Netdisk staging", "timer output is redirected under .state/gc and capped by gc --result-limit", ], @@ -1400,7 +1538,7 @@ function gcPolicyInstall(options: GcPolicyOptions): unknown { function gcPolicyFiles(): Record { const gcStateDir = rootPath(".state", "gc"); const bunPath = bunExecutablePath(); - const gcScript = `cd ${shellQuote(repoRoot)} && mkdir -p ${shellQuote(gcStateDir)} && ${shellQuote(bunPath)} scripts/cli.ts gc run --confirm --no-db-summary --no-journal --build-cache-until 24h --include-vpn-diagnostic-logs --vpn-diagnostic-log-keep-hours 24 --limit 5000 --result-limit 25 > ${shellQuote(join(gcStateDir, "last-run.json"))} 2> ${shellQuote(join(gcStateDir, "last-run.stderr"))}`; + const gcScript = `cd ${shellQuote(repoRoot)} && mkdir -p ${shellQuote(gcStateDir)} && ${shellQuote(bunPath)} scripts/cli.ts gc run --confirm --no-db-summary --no-journal --build-cache-until 24h --include-vpn-diagnostic-logs --vpn-diagnostic-log-keep-hours 24 --include-state-artifacts --state-artifact-keep-days 14 --limit 5000 --result-limit 25 > ${shellQuote(join(gcStateDir, "last-run.json"))} 2> ${shellQuote(join(gcStateDir, "last-run.stderr"))}`; return { journald: { path: "/etc/systemd/journald.conf.d/unidesk-gc.conf", @@ -1546,6 +1684,18 @@ function executeCandidate(candidate: GcCandidate, options: GcOptions): { reclaim unlinkSync(candidate.path); return { reclaimedBytes: before }; } + if (candidate.kind === "state-artifact-file-delete" && candidate.path !== undefined) { + assertStateArtifactFileCandidatePath(candidate.path, options); + const before = safeFileSize(candidate.path); + unlinkSync(candidate.path); + return { reclaimedBytes: before }; + } + if (candidate.kind === "state-artifact-dir-delete" && candidate.path !== undefined) { + assertStateArtifactDirCandidatePath(candidate.path, options); + const before = safePathSize(candidate.path); + rmSync(candidate.path, { recursive: true, force: true }); + return { reclaimedBytes: before }; + } if (candidate.kind === "vpn-diagnostic-pcap-delete" && candidate.path !== undefined) { assertVpnDiagnosticPcapCandidatePath(candidate.path); assertPathNotOpen(candidate.path); @@ -1651,6 +1801,55 @@ function assertBaiduStagingCandidatePath(path: string): void { } } +function assertStateArtifactFileCandidatePath(path: string, options: GcOptions): void { + if (!options.stateArtifacts) throw new Error("refusing to remove state artifact without --include-state-artifacts"); + const resolved = resolve(path); + const root = matchingStateArtifactFileRoot(resolved); + if (root === null) throw new Error(`refusing to remove state artifact file outside allowlist: ${path}`); + const relativePath = resolved.slice(root.length + 1); + if (relativePath.length === 0) throw new Error(`refusing to remove state artifact root as file: ${path}`); + const stat = lstatSync(resolved); + if (!stat.isFile() || stat.isSymbolicLink()) throw new Error(`refusing to remove non-regular state artifact file: ${path}`); + assertStateArtifactAge(stat.mtimeMs, options.stateArtifactKeepDays, path); +} + +function assertStateArtifactDirCandidatePath(path: string, options: GcOptions): void { + if (!options.stateArtifacts) throw new Error("refusing to remove state artifact directory without --include-state-artifacts"); + const resolved = resolve(path); + const root = matchingStateArtifactDirRoot(resolved); + if (root === null) throw new Error(`refusing to remove state artifact directory outside allowlist: ${path}`); + const relativePath = resolved.slice(root.length + 1); + if (relativePath.length === 0 || relativePath.includes("/")) { + throw new Error(`refusing to remove nested or root state artifact directory: ${path}`); + } + const stat = lstatSync(resolved); + if (!stat.isDirectory() || stat.isSymbolicLink()) throw new Error(`refusing to remove non-directory state artifact path: ${path}`); + assertStateArtifactAge(stat.mtimeMs, options.stateArtifactKeepDays, path); +} + +function matchingStateArtifactFileRoot(resolved: string): string | null { + for (const rootInfo of STATE_ARTIFACT_FILE_ROOTS) { + const root = resolve(rootPath(...rootInfo.relativeRoot)); + if (!isPlainDirectory(root)) continue; + if (resolved.startsWith(`${root}/`)) return root; + } + return null; +} + +function matchingStateArtifactDirRoot(resolved: string): string | null { + for (const rootInfo of STATE_ARTIFACT_DIR_ROOTS) { + const root = resolve(rootPath(...rootInfo.relativeRoot)); + if (!isPlainDirectory(root)) continue; + if (resolved.startsWith(`${root}/`)) return root; + } + return null; +} + +function assertStateArtifactAge(mtimeMs: number, keepDays: number, path: string): void { + const cutoffMs = Date.now() - keepDays * 24 * 60 * 60 * 1000; + if (mtimeMs >= cutoffMs) throw new Error(`refusing to remove state artifact newer than ${keepDays} days: ${path}`); +} + function assertVpnDiagnosticPcapCandidatePath(path: string): void { const resolved = resolve(path); const root = resolve(VPN_DIAGNOSTIC_LOG_ROOT); @@ -1752,6 +1951,15 @@ function collectFiles(root: string): Array<{ path: string; sizeBytes: number; mt return result; } +function isPlainDirectory(path: string): boolean { + try { + const stat = lstatSync(path); + return stat.isDirectory() && !stat.isSymbolicLink(); + } catch { + return false; + } +} + function safePathSize(path: string, timeoutMs = DEFAULT_PATH_SIZE_TIMEOUT_MS): number { return pathSizeFromDu(path, timeoutMs) ?? 0; } diff --git a/scripts/src/help.ts b/scripts/src/help.ts index 1cbbc0e8..ce81e6f3 100644 --- a/scripts/src/help.ts +++ b/scripts/src/help.ts @@ -280,6 +280,7 @@ function gcHelp(): unknown { "bun scripts/cli.ts gc plan --logs-keep-days 7 --docker-log-max-bytes 50M --journal-target-size 512M", "bun scripts/cli.ts gc run --confirm --build-cache-all --include-browser-cache", "bun scripts/cli.ts gc run --confirm --include-browser-cache", + "bun scripts/cli.ts gc plan --target-use-percent 59 --include-state-artifacts --state-artifact-keep-days 14 --full", "bun scripts/cli.ts gc db-trace plan --before-date 2026-05-25", "bun scripts/cli.ts gc db-trace run --confirm --before-date 2026-05-25 --vacuum-full", "bun scripts/cli.ts gc policy plan", @@ -290,11 +291,11 @@ function gcHelp(): unknown { "bun scripts/cli.ts gc remote G14 status --job-id ", "bun scripts/cli.ts gc plan --full", ], - description: "Plan or execute bounded one-time disk relief for file logs, Docker json logs, systemd journal, Docker BuildKit cache, allowlisted /tmp artifacts, scoped remote core dumps and explicitly scoped database trace telemetry retention.", + description: "Plan or execute bounded one-time disk relief for file logs, Docker json logs, systemd journal, Docker BuildKit cache, allowlisted /tmp artifacts, opt-in UniDesk .state artifact retention, scoped remote core dumps and explicitly scoped database trace telemetry retention.", safety: { default: "plan is read-only and mutation=false", runGuard: "run requires --confirm", - protected: ["PostgreSQL PGDATA", "Docker volumes", "Docker images", "Baidu Netdisk staging/backups", "D601 registry storage"], + protected: ["PostgreSQL PGDATA", "Docker volumes", "Docker images", ".state/recovery", ".state/codex-queue/codex-home", ".state/deploy/work", ".state/baidu-netdisk", "Codex sessions/auth", "active worktree/runtime image/snapshot", "D601 registry storage"], database: "default gc run is database diagnostic-only; gc db-trace is the explicit trace telemetry retention path and requires --confirm plus --vacuum-full", }, options: { @@ -312,15 +313,17 @@ function gcHelp(): unknown { "--registry-gc-only": "remote G14 only: run official registry garbage-collect without deleting additional tags; intended for interrupted registry retention recovery", "--registry-keep-per-repo N": "remote registry only: keep at least N newest tags per service repo; default 20, minimum 1", "--registry-min-age-hours N": "remote registry only: keep all tags newer than N hours; default 48, minimum 0", - "--target-use-percent N": "remote only: evaluate whether planned candidates can reduce root filesystem use to N%; reports required reclaim, projected use, shortfall and safe-stop decision", + "--target-use-percent N": "evaluate whether planned candidates can reduce root filesystem use to N%; reports required reclaim, projected use, shortfall and safe-stop decision", "--job-id ID": "remote status only: inspect a long-running remote gc job", "--limit N": "number of candidates returned and executed by run when --full is not set; default 50", "--result-limit N": "number of per-candidate run results returned when --full is not set; default 50", "--full|--raw": "return and run against all candidates rather than the default bounded page", "--include-browser-cache": "also remove repo-local .state/playwright-browsers cache", + "--include-state-artifacts": "manual local gc only: opt in to stale UniDesk .state artifact retention for allowlisted diagnostic files and deploy artifact direct directories", + "--state-artifact-keep-days N": "keep recent UniDesk .state artifacts for N days; default 14; must be a positive integer", "db-trace --before-date YYYY-MM-DD": "plan or delete default trace telemetry event types before the date", "db-trace run --vacuum-full": "rewrite public.oa_events after deletion so df can reclaim disk; requires maintenance window", - "policy plan|install": "render or install journald caps and a daily file-log plus allowlisted /tmp low-risk gc systemd timer", + "policy plan|install": "render or install journald caps and a daily file-log, allowlisted /tmp, VPN pcap and 14-day UniDesk .state artifact low-risk gc systemd timer", "remote plan|run": "run bounded GC through UniDesk SSH passthrough on a provider host; G14 protects HWLAB k3s/runtime/PVC/workspace paths, and HWLAB registry retention is explicit opt-in with workload-ref, digest-closure, recent-tag and per-repo tag protection", "--no-file-logs|--no-docker-logs|--no-journal|--no-build-cache|--no-tmp|--no-db-summary": "disable one collector", },