diff --git a/docs/reference/cli.md b/docs/reference/cli.md index 0059690a..a77034db 100644 --- a/docs/reference/cli.md +++ b/docs/reference/cli.md @@ -8,7 +8,7 @@ CLI 可以从 `master` 快速演进,但必须兼容 `deploy.json` 固定的 CI - `help` 输出命令索引,适合作为交互式入口。 - 每个 CLI 命名空间必须支持 `help`、`--help` 或 `-h` 并返回 JSON,不得为了打印帮助而访问 runtime 服务、拉起交互会话或执行长时任务。 -- `--main-server-ip ` 默认通过公网 frontend 登录态调用主 server 的同源 API 代理,不要求计算节点持有主 server SSH key;显式提供 `--main-server-key` 或 `--main-server-transport ssh` 时才使用旧 SSH 传输。 +- `--main-server-ip ` 默认通过公网 frontend 登录态调用主 server 的同源 API 代理,不要求计算节点持有主 server SSH key;显式提供 `--main-server-key` 或 `--main-server-transport ssh` 时才使用旧 SSH 传输。远程 frontend 传输下的 `ssh ...` 必须复用同一套结构化 route parser,支持 `D601`、`G14`、host workspace、`D601:k3s` 和 `D601:k3s::` 这类定位路径;它不向调用容器下发 provider token,也不要求调用容器能解析 backend-core 内网 DNS。 - `config show` 读取并校验根目录 `config.json`,不从环境变量、默认值或隐藏文件静默补配置。 - `check` 默认只执行轻量配置校验、Bun 版本检查和 Bun Transpiler 语法解析(覆盖 CLI 入口、主要 `scripts/` 模块和核心组件入口,不做类型推导);关键文件存在性、`scripts/` TypeScript 类型检查、`src/components/` TypeScript 类型检查、Docker Compose config、日志轮转策略扫描和 D601 recovery guardrails 默认不启用,分别通过 `--files`、`--scripts-typecheck`、`--components`、`--compose`、`--logs`、`--recovery-guardrails` 开启,或用 `--full` 一次性开启。`check recovery-guardrails` 是同一诊断的低噪声直接入口,报告 malformed `/proc/mounts`、kubelet validation risk、stale CRI sandbox count、Code Queue worktree/symlink、Code Queue/MDTODO hostPath 和 `ContainerCreating` 分类;它不得重启 k3s、删除 CRI sandbox、修改 hostPath、deploy/rollout 或 prune/reset。`--rust` 只允许在 D601 CI/dev execution 中配合 `UNIDESK_D601_RUST_CHECK=1` 使用,长期规则见 `docs/reference/dev-environment.md` 和 `docs/reference/devops-hygiene.md`。 - `server start` 创建异步 job,在后台执行 Docker 构建和启动;命令本身只负责返回 job id、日志路径和启动命令。 @@ -19,11 +19,12 @@ CLI 可以从 `master` 快速演进,但必须兼容 `deploy.json` 固定的 CI - `server cleanup plan [--min-age-hours N] [--limit N]` 只生成主 server Docker 镜像清理 dry-run 计划,不执行删除;默认 `--min-age-hours 24`,避免把刚发布或刚验证的镜像列为 stale。输出必须包含 `dryRun=true`、`mutation=false`、`policy.deletionExecuted=false`、active containers/images、受保护镜像、candidate stale images、估算释放空间、风险等级、`commandsToReview` 和人工审批清单。计划必须保守白名单:保留 running containers 使用的 image ID,保留 stopped containers 引用的 image ID 直到人工先复核容器,保留 `deploy.json`/`CI.json` 当前 commit-pinned artifact、Compose stable image、上游 digest pin 和 provider-gateway runner image;`protectedStorage` 必须显式列出 PostgreSQL named volume、Baidu Netdisk `.state`、D601 registry storage 和 Docker volumes/host data policy。该入口禁止生成或执行 `docker system prune`、`docker image prune`、`docker builder prune`、`docker volume rm`、`docker compose down -v`、数据库清理或 host data `rm` 命令;未来若增加真实删除,必须另设显式审批参数并先复核 dry-run 输出。 - `server rebuild ` 创建异步 job,先构建目标服务镜像,随后在 `.state/locks/server-compose.lock` 串行保护下用 `--no-deps --force-recreate` 替换目标 service 并等待容器 `healthy/running`;该命令用于替代手工删除容器的兜底流程,其中 `dev-frontend-proxy` 只更新主 server dev 入口薄代理,`todo-note`、`code-queue-mgr`、`project-manager`、`baidu-netdisk` 和 `oa-event-flow` 只重建主 server 承载的对应后端,不会重建或删除 database 命名卷。D601 Code Queue 执行面不由 `server rebuild` 管理,Rust backend-core 迭代不得用 `server rebuild backend-core` 在 master server 编译,规则见 `docs/reference/dev-environment.md`。 - `provider attach [--master-server URL] [--up] [--force]` 在新计算节点生成两项配置的 provider-gateway 挂载包:`.state/provider-.env` 默认只包含 `UNIDESK_MASTER_SERVER` 与 `PROVIDER_ID`,`provider-.yml` 固定 Docker socket、`pid: "host"`、`restart: always`、只读 `/workspace` 和 SSH 维护私钥挂载;`--up` 会立即执行生成的 `docker compose up -d --build`。`provider triage [--observed-error text] [--observed-scope scope] [--microservice id ...] [--full|--raw]` 是只读多信号健康裁决入口,会把单路径 `provider is not online`、SSH 超时、registry 失败和 service proxy 失败归类成 `runner-local-observation-gap`、`service-degraded`、`provider-degraded` 或 `global-blocker`。默认输出只返回裁决、scope、失败/降级/未知信号和有界 evidence 摘要,完整 evidence 必须显式加 `--full` 或 `--raw`;推荐交叉验证命令仍包含 `debug health`、`debug dispatch host.ssh --wait-ms 15000`、`ssh argv true`、`artifact-registry health --provider-id `、`microservice health k3sctl-adapter`、`microservice health code-queue` 和 `codex tasks --view supervisor --limit 20`。 -- `ssh [operation args...]` 通过 backend-core 内网 WebSocket broker 和 provider-gateway 的 Host SSH / WSL SSH 维护桥连接目标节点;`route` 基础形态是 provider id,例如 `D601` 或 `G14`,也可以扩展为纯定位路径 `provider:plane[:namespace:resource[:container]]`,例如 `G14:k3s`、`D601:k3s` 或 `G14:k3s::`。非交互远端命令优先使用 `ssh argv ...`;需要 shell 脚本、管道、变量或循环时优先使用 quoted heredoc 单步传输,例如 `bun scripts/cli.ts ssh G14 script <<'SCRIPT'`、`bun scripts/cli.ts ssh G14:k3s script <<'SCRIPT'` 或 `bun scripts/cli.ts ssh G14:k3s:: script <<'SCRIPT'`,把脚本走 stdin,而不是把脚本压成多层引号字符串。需要在 pod 内改文件时优先使用 `:k3s:: apply-patch`,CLI 会临时注入 pod 内 `apply_patch` helper 并把 patch stdin 交给它。ssh-like 命令遇到 timeout/kex/255 类失败时,CLI 会在 stderr 追加一行 `UNIDESK_SSH_HINT` JSON,提示 stdin script/argv 重试和 provider triage 交叉验证。 +- `ssh [operation args...]` / `tran [operation args...]` 通过 backend-core 内网 WebSocket broker 和 provider-gateway 的 Host SSH / WSL SSH 维护桥连接目标节点;`route` 基础形态是 provider id,例如 `D601` 或 `G14`,也可以扩展为纯定位路径 `provider:plane[:namespace:resource[:container]]`,例如 `G14:k3s`、`D601:k3s` 或 `G14:k3s::`。非交互远端命令优先使用 `ssh argv ...`;需要 shell 脚本、管道、变量或循环时优先使用 quoted heredoc 单步传输,例如 `tran G14 script <<'SCRIPT'`、`tran G14:k3s script <<'SCRIPT'` 或 `tran G14:k3s:: script <<'SCRIPT'`,把脚本走 stdin,而不是把脚本压成多层引号字符串。需要在 pod 内改文件时优先使用 `:k3s:: apply-patch`,CLI 会临时注入 pod 内 `apply_patch` helper 并把 patch stdin 交给它。ssh-like 命令遇到 timeout/kex/255 类失败时,CLI 会在 stderr 追加一行 `UNIDESK_SSH_HINT` JSON,提示 stdin script/argv 重试和 provider triage 交叉验证。 - `ssh apply-patch [tool args...] < patch.diff` 直接调用远端注入的 `apply_patch` 工具,并把本地 stdin 中的标准 `*** Begin Patch` / `*** End Patch` patch 流透传给目标节点。 - `ssh py [script-args...] < script.py` 把本地 stdin 落到远端临时 `.py` 文件后再以 `python3 -u` 执行并自动清理,避免再手写 `'python3 -'`、heredoc 或多层引号;`script-args` 会按 argv 安全透传给远端脚本。 - `ssh skills [--scope all|wsl|windows] [--limit N]` 发现目标节点上的 WSL/Linux skill 根目录;当 provider 是 WSL 时同一次调用还会扫描 Windows 用户目录下的 `.agents/skills` 与 `.codex/skills`。 - `ssh :k3s[:namespace:workload[:container]] ...` 是原生 k3s 结构化 route 入口,route 只定位控制面或 workload,`kubectl`、`logs`、`exec`、`script`、`apply-patch` 和普通容器命令作为 operation 放在 route 之后;CLI 固定注入 `KUBECONFIG=/etc/rancher/k3s/k3s.yaml` 并把 kubectl、workload exec 和 logs 参数组装成 argv,避免在 Host SSH、bash、kubectl exec 和容器 shell 之间反复手写多层引号;D601 与 G14 都有 provider-specific guard,分别校验 `d601` 和 G14 k3s 节点身份。 +- Code Queue runner 镜像必须在 PATH 上提供 `/usr/local/bin/tran`。runner 内的 `tran` 检测到 `CODE_QUEUE_*` 或 `KUBERNETES_SERVICE_HOST` 后,默认执行 `bun /root/unidesk/scripts/cli.ts --main-server-ip ssh ...`,其中 `` 优先来自 `UNIDESK_MAIN_SERVER_IP` / `UNIDESK_MAIN_SERVER_HOST` / `CODE_QUEUE_DEV_CONTAINER_MASTER_HOST`。runner remote frontend HTTP 客户端默认使用 `curl` 后端,降低 Bun 在部分 runner 内读取 HTTP response body 时触发 native crash 的风险;显式 `UNIDESK_REMOTE_HTTP_CLIENT=fetch` 可用于诊断。runner 内跨 D601/G14 的分布式访问应优先使用无 stdin 的 `tran D601 argv ...`、`tran G14 argv ...`、`tran D601:k3s kubectl ...` 和 `tran D601:k3s:: argv ...`;`script`、`apply-patch`、`py` 等 stdin helper 需要在主 server/host 侧 `tran` 或显式 `--main-server-transport ssh` 中执行,直到 frontend dispatch 支持 stdin 流。 - `microservice list/status/health/diagnostics/tunnel-self-test/proxy` 通过 backend-core 内网 API 管理挂载在计算节点 Docker 或 k3s 控制面中的用户服务(底层命令名仍为 microservice);`health`、`status` 和 `diagnostics` 默认返回 compact summary、body 字节数和 `--full|--raw` 展开命令,只有小 body 或无法抽取 summary 时才带有界 body preview,避免 Code Queue/k3s 诊断一次性输出爆炸;`tunnel-self-test` 和 `proxy` 会走真实 backend-core -> provider-gateway 或 k3sctl-adapter -> 节点服务链路。`microservice health code-queue` 使用 commander-safe 专用摘要,必须保留 ok/status、service id、running count、queue count、heartbeat freshness/risk、split-brain/live/degraded 解释和 raw drill-down 命令;需要完整健康 JSON 时显式加 `--raw` 或 `--full`,等价深挖路径是 `microservice proxy code-queue /health --raw --full`。`proxy` 支持受控 JSON 请求体并对超大响应 body 默认输出有界预览,规则见 `docs/reference/microservices.md`。 - `decision upload/list/show/health` 通过 backend-core 用户服务代理访问 D601 k3s Decision Center,用于上传会议记录/决议 Markdown、列出权威记录、查看详情和健康检查;`decision list` 默认只返回摘要并省略完整 Markdown body,需要排查大正文时显式加 `--include-body`。正式文书字段通过 records 模型一等字段返回和查询:`--doc-no DC-...`、`--doc-type DCSN|GOAL|PLAN|RPRT|ACTN|ISSU|RETR|RQST|RESP|MINS`、`--doc-priority P0|P1|P2|P3`、`--year YYYY`、`--signer`、`--issued-at`、`--effective-scope`、`--supersedes`、`--superseded-by`;`show` 和 `requirement update` 可使用 `id` 或 `docNo`。`decision requirement list/create/upsert/update/show` 在同一 records 模型上管理 `goal|decision|blocker|debt|experiment` 需求记录,`docNo` 唯一,未传 `--doc-no` 但提供 `--doc-type/--doc-priority/--year` 时由服务分配下一个序号。它们不得直连 D601 Service、NodePort 或 provider-gateway 业务 HTTP。 - `decision diary import ` 将带 `# YYYY年M月D日`、`# YYYY-MM-DD` 或 `# YYYY/M/D` 标题的工作日志拆成每天一篇 Markdown 日记,按 `YYYY-MM/YYYY-MM-DD.md` 虚拟路径写入 Decision Center PostgreSQL;`decision diary list/history` 默认只返回摘要,需要完整 Markdown 时显式加 `--include-body`;`decision diary show [--source-file path]` 查看单日正文,`--source-file` 用于同一天存在多个导入来源时精确选择;`decision diary edit|upsert --body-file [--title text] [--source-file path] [--tag tag]` 通过 `PUT /api/diary/entries/:idOrDate` 创建当天或历史条目并编辑既有条目。 diff --git a/docs/reference/code-queue-supervision.md b/docs/reference/code-queue-supervision.md index a44314de..5f31063c 100644 --- a/docs/reference/code-queue-supervision.md +++ b/docs/reference/code-queue-supervision.md @@ -407,7 +407,7 @@ Code Queue task 不是只要 push 代码就算完成。 - WebUI 和 CLI 的 proxy 路径不一致; - deploy job 报失败但服务 API 实际健康; - 指挥侧突发 submit 打满 Code Queue manager 或低内存主机,导致队列还没确认任务就被压垮; -- Code Queue 容器缺少监督所需的基础工具或凭证路径,例如 `gh`、`hub` 或 GitHub token 注入路径。 +- Code Queue 容器缺少监督所需的基础工具或凭证路径,例如 `gh`、`hub`、`tran` 或 GitHub token 注入路径。 - D601、provider-gateway、registry、k3sctl-adapter 或 service proxy 的单路径瞬时失败被 worker 放大为全局阻塞,而缺少多信号健康裁决和可重试错误分类。 - runner、provider 或目标 runtime 缺少 Secret/env 注入、DNS、egress、registry auth、GitHub auth 或受控 rollout 权限,导致业务任务无法通过常规路径完成。 @@ -415,6 +415,8 @@ Code Queue task 不是只要 push 代码就算完成。 如果缺陷只存在于 Code Queue 执行环境,且服务可以在 dev 中安全热修而不触碰 prod,应先做最小临时 live remedy。然后把修复持久化到相关 Dockerfile、容器镜像或凭证传播路径,并在 dev 验证持久化修复后再关闭问题。 +Code Queue runner 的分布式访问能力必须通过镜像内 `/usr/local/bin/tran` 固化,而不是依赖临时拷贝脚本或手工记忆命令前缀。runner 内 `tran` 走公网 frontend 控制面和 `/api/dispatch`,不要求 runner pod 持有 provider token,也不要求 runner pod 能解析 backend-core 集群内 DNS;因此当 runner-local DNS/Secret 缺失而 frontend dispatch 仍可用时,应判为可恢复的 runner-local observation gap。runner 环境中的 remote frontend HTTP 读写默认使用 `curl` 后端,以降低部分 runner 上 Bun HTTP body 读取 native crash 的风险;这不是绕过控制面,route parser 和 dispatch payload 仍由 UniDesk CLI 生成。当前必需验收目标是 D601 的真实 Code Queue pod:至少执行 `tran D601 argv ...`、`tran G14 argv ...` 以及一个 `tran :k3s ...` 只读命令,证明 D601 runner 能跨 provider host 与 k3s route 透传。G14 runner 可作为后续兼容性观察,不作为该合同当前阻塞条件。 + 如果业务任务发现缺少工具、Secret/env、DNS、egress 或凭证路径,指挥官应把它拆成独立 infra task,并标记为 `runnerDisposition=infra-blocked` 或等价基础设施阻塞,而不是埋在业务任务 prompt 中。业务 runner 不应自行摸索 live Secret、打印 env/token、复制凭证命令、扩大网络出口或通过反复 rollout 猜测问题;它只能提交脱敏证据、说明缺失能力和等待指挥官或 infra lane 处理。业务任务在 bridge 存在时应继续推进。 Artifact publish preflight 也属于基础设施问题的只读分类范畴:`artifact-registry status|health` 和 `ci publish-user-service --dry-run` 返回 `runnerDisposition=infra-blocked` 时,通常说明 backend-core/database/provider 通道缺失,而不是用户服务本身的业务错误。此时应先恢复控制通道,再决定是否重试,不要把裸 `No such container` 当成可直接回归的业务失败。 diff --git a/docs/reference/microservices.md b/docs/reference/microservices.md index 15079531..921904a3 100644 --- a/docs/reference/microservices.md +++ b/docs/reference/microservices.md @@ -42,6 +42,8 @@ UniDesk 用户服务是挂载到 UniDesk 核心服务上的、面向用户使用 业务仓库由业务系统自己维护,包括源码、Dockerfile、docker-compose、配置模板和业务测试。UniDesk 只引用业务仓库 URL、commit id、Dockerfile/docker-compose 路径和运行容器名;不得把业务全量代码复制到 `src/components/microservices/` 形成双维护。`src/components/microservices/` 只能放通用示例或 UniDesk 自有示例,不作为业务仓库镜像。 +Code Queue runner 也是分布式开发执行面。runner 镜像必须内置 `tran`,让 runner 在执行任务时能通过公网 frontend 控制面访问 D601、G14、host workspace、k3s 控制面和目标 pod。runner 内应优先使用 `tran argv ...`、`tran :k3s kubectl ...`、`tran :k3s:: argv ...` 这类无 stdin 的结构化命令;需要 stdin 的 `script`、`apply-patch`、`py` 操作仍由主 server/host 侧 `tran` 或显式 SSH transport 执行。这个边界避免把 provider token、backend-core 内网 DNS 或长命令多层引号作为 runner 可用性的前提。 + ## Main Server User Services 主 server 只承载对统一入口、状态迁移或控制面自动化有明确必要的用户服务。该类服务仍遵守不暴露公网端口、前端统一 React 控件化展示的规则;业务持久状态必须写入主 PostgreSQL;`.state/` 只能保存日志归档、缓存或可重建工件,不能作为任务、队列、未读、通知 outbox 等权威状态来源。 diff --git a/scripts/src/remote.ts b/scripts/src/remote.ts index 1580c7ae..6e5f66a9 100644 --- a/scripts/src/remote.ts +++ b/scripts/src/remote.ts @@ -1,9 +1,12 @@ import { spawn } from "node:child_process"; +import { mkdtemp, readFile, rm, writeFile } from "node:fs/promises"; +import { tmpdir } from "node:os"; +import path from "node:path"; import { type UniDeskConfig } from "./config"; import { type DebugDispatchCommand, isDebugDispatchCommand } from "./debug"; import { summarizeMicroserviceHealthResponse, summarizeMicroserviceObservation, summarizeMicroserviceProxyResponse } from "./microservices"; import { parseNetworkPerfOptions, runNetworkPerf } from "./network-perf"; -import { formatSshFailureHint, isSshSkillDiscoveryArgs, parseSshArgs, sshFailureHint } from "./ssh"; +import { formatSshFailureHint, isSshSkillDiscoveryArgs, parseSshInvocation, sshFailureHint } from "./ssh"; import { codexJudgeQueryAsync, codexOutputQueryAsync, codexPrPreflightQueryAsync, codexQueuesQueryAsync, codexTaskQueryAsync, codexTasksQueryAsync, codexUnreadTriageAsync } from "./code-queue"; import { runDecisionCenterCommandAsync } from "./decision-center"; import { @@ -44,6 +47,7 @@ interface FetchJsonResult { status?: number; body?: unknown; error?: string; + responseHeaders?: Record; responseTruncated?: boolean; responseBytesRead?: number; responseContentLength?: string | null; @@ -99,6 +103,13 @@ function normalizeRemoteHostHint(raw: string | undefined): string | null { return value.replace(/\/+$/u, ""); } +function remoteHttpClientMode(env: NodeJS.ProcessEnv = process.env): "curl" | "fetch" { + const explicit = env.UNIDESK_REMOTE_HTTP_CLIENT?.trim().toLowerCase(); + if (explicit === "fetch") return "fetch"; + if (explicit === "curl") return "curl"; + return isCodeQueueRunnerEnv(env) ? "curl" : "fetch"; +} + function isCodeQueueRunnerEnv(env: NodeJS.ProcessEnv): boolean { return Boolean(env.CODE_QUEUE_SERVICE_ROLE || env.CODE_QUEUE_INSTANCE_ID || env.CODE_QUEUE_DEV_CONTAINER_MASTER_HOST || env.KUBERNETES_SERVICE_HOST); } @@ -293,6 +304,7 @@ function frontendBaseUrl(host: string, config: UniDeskConfig): string { } async function readJson(url: string, init?: RequestInit, timeoutMs = 8000, maxResponseBytes = 5_000_000): Promise { + if (remoteHttpClientMode() === "curl") return readJsonWithCurl(url, init, timeoutMs, maxResponseBytes); const controller = new AbortController(); const timer = setTimeout(() => controller.abort(), timeoutMs); try { @@ -339,7 +351,7 @@ async function readJson(url: string, init?: RequestInit, timeoutMs = 8000, maxRe if (responseTruncated) { body = { _unideskResponseTruncated: true, maxResponseBytes, bytesRead: bytes, contentLength: res.headers.get("content-length"), textPreview: text }; } - return { ok: res.ok, status: res.status, body, responseTruncated, responseBytesRead: bytes, responseContentLength: res.headers.get("content-length") }; + return { ok: res.ok, status: res.status, body, responseHeaders: responseHeadersRecord(res.headers), responseTruncated, responseBytesRead: bytes, responseContentLength: res.headers.get("content-length") }; } catch (error) { return { ok: false, error: error instanceof Error ? error.message : String(error) }; } finally { @@ -347,30 +359,145 @@ async function readJson(url: string, init?: RequestInit, timeoutMs = 8000, maxRe } } +function responseHeadersRecord(headers: Headers): Record { + const record: Record = {}; + headers.forEach((value, key) => { + record[key.toLowerCase()] = value; + }); + return record; +} + +function requestHeaders(init?: RequestInit): Array<[string, string]> { + const headers = new Headers(init?.headers); + const output: Array<[string, string]> = []; + headers.forEach((value, key) => output.push([key, value])); + return output; +} + +async function runCurl(args: string[], timeoutMs: number): Promise<{ status: number | null; stdout: string; stderr: string; timedOut: boolean; error?: string }> { + const child = spawn("curl", args, { stdio: ["ignore", "pipe", "pipe"] }); + const stdoutChunks: Buffer[] = []; + const stderrChunks: Buffer[] = []; + let timedOut = false; + const timer = setTimeout(() => { + timedOut = true; + child.kill("SIGTERM"); + }, timeoutMs + 1000); + child.stdout?.on("data", (chunk) => stdoutChunks.push(Buffer.from(chunk))); + child.stderr?.on("data", (chunk) => stderrChunks.push(Buffer.from(chunk))); + return await new Promise((resolve) => { + child.on("error", (error) => { + clearTimeout(timer); + resolve({ status: null, stdout: "", stderr: "", timedOut, error: error.message }); + }); + child.on("close", (status) => { + clearTimeout(timer); + resolve({ + status, + stdout: Buffer.concat(stdoutChunks).toString("utf8"), + stderr: Buffer.concat(stderrChunks).toString("utf8"), + timedOut, + }); + }); + }); +} + +function parseCurlResponseHeaders(raw: string): Record { + const blocks = raw.split(/\r?\n\r?\n/u).map((block) => block.trim()).filter(Boolean); + const latest = blocks.at(-1) ?? ""; + const headers: Record = {}; + for (const line of latest.split(/\r?\n/u).slice(1)) { + const splitAt = line.indexOf(":"); + if (splitAt <= 0) continue; + headers[line.slice(0, splitAt).trim().toLowerCase()] = line.slice(splitAt + 1).trim(); + } + return headers; +} + +function decodeBoundedBody(buffer: Buffer, maxResponseBytes: number): { text: string; truncated: boolean; bytesRead: number } { + if (buffer.byteLength <= maxResponseBytes) return { text: buffer.toString("utf8"), truncated: false, bytesRead: buffer.byteLength }; + return { text: buffer.subarray(0, maxResponseBytes).toString("utf8"), truncated: true, bytesRead: maxResponseBytes }; +} + +async function readJsonWithCurl(url: string, init?: RequestInit, timeoutMs = 8000, maxResponseBytes = 5_000_000): Promise { + const dir = await mkdtemp(path.join(tmpdir(), "unidesk-remote-http-")); + const headersFile = path.join(dir, "headers.txt"); + const bodyFile = path.join(dir, "body.bin"); + const requestBodyFile = path.join(dir, "request-body.bin"); + try { + const method = init?.method ?? (init?.body === undefined ? "GET" : "POST"); + const args = [ + "-sS", + "--max-time", String(Math.max(1, Math.ceil(timeoutMs / 1000))), + "-D", headersFile, + "-o", bodyFile, + "-w", "%{http_code}", + "-X", method, + ]; + for (const [key, value] of requestHeaders(init)) args.push("-H", `${key}: ${value}`); + if (init?.body !== undefined) { + await writeFile(requestBodyFile, typeof init.body === "string" ? init.body : String(init.body)); + args.push("--data-binary", `@${requestBodyFile}`); + } + args.push(url); + const curl = await runCurl(args, timeoutMs); + if (curl.status !== 0) { + const curlError = curl.error ?? curl.stderr.trim(); + return { + ok: false, + status: curl.stdout.trim().match(/^\d{3}$/u) ? Number(curl.stdout.trim()) : undefined, + error: curl.timedOut ? `curl timed out after ${timeoutMs}ms` : (curlError.length > 0 ? curlError : `curl exited with ${curl.status}`), + }; + } + const status = Number(curl.stdout.trim()); + const headersText = await readFile(headersFile, "utf8").catch(() => ""); + const responseHeaders = parseCurlResponseHeaders(headersText); + const bodyBuffer = await readFile(bodyFile).catch(() => Buffer.alloc(0)); + const decoded = decodeBoundedBody(bodyBuffer, maxResponseBytes); + let body: unknown = null; + try { + body = decoded.text.length > 0 && !decoded.truncated ? JSON.parse(decoded.text) : null; + } catch { + body = { text: decoded.text }; + } + if (decoded.truncated) { + body = { + _unideskResponseTruncated: true, + maxResponseBytes, + bytesRead: decoded.bytesRead, + contentLength: responseHeaders["content-length"] ?? null, + textPreview: decoded.text, + }; + } + return { + ok: status >= 200 && status < 300, + status, + body, + responseHeaders, + responseTruncated: decoded.truncated, + responseBytesRead: decoded.bytesRead, + responseContentLength: responseHeaders["content-length"] ?? null, + }; + } catch (error) { + return { ok: false, error: error instanceof Error ? error.message : String(error) }; + } finally { + await rm(dir, { recursive: true, force: true }).catch(() => undefined); + } +} + async function loginFrontend(host: string, config: UniDeskConfig): Promise { const baseUrl = frontendBaseUrl(host, config); - const controller = new AbortController(); - const timer = setTimeout(() => controller.abort(), 8_000); - let res: Response; - try { - res = await fetch(`${baseUrl}/login`, { - method: "POST", - headers: { "content-type": "application/json" }, - body: JSON.stringify({ username: config.auth.username, password: config.auth.password }), - signal: controller.signal, - }); - } catch (error) { - throw new RemoteCliFailure("remote-proxy-missing", `frontend login request failed via ${baseUrl}: ${error instanceof Error ? error.message : String(error)}`, { baseUrl }); - } finally { - clearTimeout(timer); - } - const body = await res.text(); + const res = await readJson(`${baseUrl}/login`, { + method: "POST", + headers: { "content-type": "application/json" }, + body: JSON.stringify({ username: config.auth.username, password: config.auth.password }), + }, 8_000, 120_000); if (!res.ok) { const failureClassification: RemoteFailureClassification = res.status === 401 || res.status === 403 ? "auth-missing" : "remote-proxy-missing"; - throw new RemoteCliFailure(failureClassification, `frontend login failed via ${baseUrl}: status=${res.status} body=${body.slice(0, 300)}`, { baseUrl, status: res.status }); + throw new RemoteCliFailure(failureClassification, `frontend login failed via ${baseUrl}: status=${res.status ?? "unknown"} body=${JSON.stringify(res.body ?? res.error).slice(0, 300)}`, { baseUrl, status: res.status ?? null }); } - const cookie = res.headers.get("set-cookie")?.split(";")[0] ?? ""; - if (cookie.length === 0) throw new RemoteCliFailure("auth-missing", `frontend login via ${baseUrl} did not return a session cookie`, { baseUrl, status: res.status }); + const cookie = res.responseHeaders?.["set-cookie"]?.split(";")[0] ?? ""; + if (cookie.length === 0) throw new RemoteCliFailure("auth-missing", `frontend login via ${baseUrl} did not return a session cookie`, { baseUrl, status: res.status ?? null }); return { baseUrl, cookie }; } @@ -824,11 +951,24 @@ async function remoteNetworkPerf(options: RemoteCliOptions, config: UniDeskConfi }; } -async function runRemoteSshOverFrontend(session: FrontendSession, providerId: string | undefined, args: string[]): Promise { - if (!providerId) throw new Error("remote ssh requires provider id, for example: bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh D601 hostname"); - const parsed = parseSshArgs(args); +export function remoteSshFrontendPlanForTest(target: string, args: string[]): Record { + const invocation = parseSshInvocation(target, args); + return { + providerId: invocation.providerId, + route: invocation.route, + remoteCommand: invocation.parsed.remoteCommand, + requiresStdin: invocation.parsed.requiresStdin, + invocationKind: invocation.parsed.invocationKind, + payloadCwd: invocation.route.plane === "host" ? invocation.route.workspace : null, + }; +} + +async function runRemoteSshOverFrontend(session: FrontendSession, target: string | undefined, args: string[]): Promise { + if (!target) throw new Error("remote ssh requires a route, for example: bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh D601 hostname"); + const invocation = parseSshInvocation(target, args); + const parsed = invocation.parsed; if (parsed.requiresStdin) { - process.stderr.write("remote frontend transport does not stream stdin for ssh helper subcommands such as apply-patch or py; run the command on the main server or use --main-server-transport ssh\n"); + process.stderr.write("remote frontend transport does not stream stdin for ssh helper subcommands such as script, apply-patch or py; run the command on the main server, use --main-server-transport ssh, or use an argv/pod-route operation that does not need stdin\n"); return 255; } if (parsed.remoteCommand === null) { @@ -843,9 +983,15 @@ async function runRemoteSshOverFrontend(session: FrontendSession, providerId: st const dispatch = await frontendJson(session, "/api/dispatch", { method: "POST", body: JSON.stringify({ - providerId, + providerId: invocation.providerId, command: "host.ssh", - payload: { source: "cli-remote-ssh", mode: "exec", command: remoteCommand, timeoutMs: isSshSkillDiscoveryArgs(args) ? 30000 : 15000 }, + payload: { + source: "cli-remote-ssh", + mode: "exec", + command: remoteCommand, + ...(invocation.route.plane === "host" && invocation.route.workspace !== null ? { cwd: invocation.route.workspace } : {}), + timeoutMs: isSshSkillDiscoveryArgs(args) ? 30000 : 15000, + }, }), }); const taskId = (dispatch as { body?: { taskId?: string } }).body?.taskId ?? ""; @@ -863,7 +1009,7 @@ async function runRemoteSshOverFrontend(session: FrontendSession, providerId: st if (task?.status !== "succeeded") { if (stdout.length === 0 && stderr.length === 0) process.stderr.write(`${JSON.stringify({ taskId, task }, null, 2)}\n`); const exitCode = typeof result.exitCode === "number" ? result.exitCode : 255; - const hint = sshFailureHint(providerId, parsed, exitCode, stderr.length > 0 ? stderr : String(task?.message ?? "")); + const hint = sshFailureHint(invocation.providerId, parsed, exitCode, stderr.length > 0 ? stderr : String(task?.message ?? "")); if (hint !== null) process.stderr.write(formatSshFailureHint(hint)); return exitCode; } diff --git a/scripts/src/ssh.ts b/scripts/src/ssh.ts index db51bd11..bd6ea847 100644 --- a/scripts/src/ssh.ts +++ b/scripts/src/ssh.ts @@ -1012,6 +1012,7 @@ function parseK3sTargetOperation(route: ParsedSshRoute, args: string[]): ParsedS if (operation === "apply-patch" || operation === "patch") return buildK3sApplyPatchCommand([...targetArgs, ...operationArgs]); if (operation === "script") return { remoteCommand: buildK3sScriptCommand([...targetArgs, ...operationArgs]), requiresStdin: true, invocationKind: "helper" }; if (operation === "logs") return { remoteCommand: buildK3sLogsCommand([...targetArgs, ...operationArgs]), requiresStdin: false, invocationKind: "helper" }; + if (operation === "argv") return { remoteCommand: buildK3sExecCommand([...targetArgs, ...k3sRouteCommandArgs(operationArgs)]), requiresStdin: false, invocationKind: "argv" }; if (operation === "get" || operation === "describe") { return { remoteCommand: buildK3sTargetObjectCommand(operation, route, operationArgs), requiresStdin: false, invocationKind: "helper" }; } diff --git a/scripts/ssh-argv-guidance-contract-test.ts b/scripts/ssh-argv-guidance-contract-test.ts index 9d0b1449..17c15e78 100644 --- a/scripts/ssh-argv-guidance-contract-test.ts +++ b/scripts/ssh-argv-guidance-contract-test.ts @@ -4,6 +4,7 @@ import os from "node:os"; import path from "node:path"; import { sshHelp } from "./src/help"; import { providerTriageRecommendedCrossChecks } from "./src/provider-triage"; +import { remoteSshFrontendPlanForTest } from "./src/remote"; import { formatSshFailureHint, parseSshArgs, parseSshInvocation, remoteApplyPatchSource, sshFailureHint } from "./src/ssh"; type JsonRecord = Record; @@ -103,6 +104,10 @@ export function runSshArgvGuidanceContract(): JsonRecord { assertCondition(routeTarget.route.namespace === "hwlab-dev" && routeTarget.route.resource === "hwlab-cloud-api", "route target must parse namespace and workload", routeTarget); assertCondition(routeTarget.parsed.remoteCommand === "'env' 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml' 'kubectl' 'exec' '-n' 'hwlab-dev' 'deployment/hwlab-cloud-api' '--' 'node' '-e' 'console.log(process.version)'", "D601:k3s:: must default to deployment exec", routeTarget); + const routeTargetArgv = parseSshInvocation("D601:k3s:hwlab-dev:hwlab-cloud-api", ["argv", "sh", "-c", "printf ok"]); + assertCondition(routeTargetArgv.parsed.invocationKind === "argv", "k3s target argv operation must stay explicit argv", routeTargetArgv); + assertCondition(routeTargetArgv.parsed.remoteCommand === "'env' 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml' 'kubectl' 'exec' '-n' 'hwlab-dev' 'deployment/hwlab-cloud-api' '--' 'sh' '-c' 'printf ok'", "D601:k3s:: argv must exec the argv payload instead of treating argv as a pod command", routeTargetArgv); + const routeScript = parseSshInvocation("D601:k3s:hwlab-dev:hwlab-cloud-api", ["script", "--shell", "bash", "--", "arg"]); assertCondition(routeScript.parsed.requiresStdin === true, "k3s script operation must stream local stdin", routeScript); assertCondition(routeScript.parsed.remoteCommand === "'env' 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml' 'kubectl' 'exec' '-i' '-n' 'hwlab-dev' 'deployment/hwlab-cloud-api' '--' 'bash' '-s' '--' 'arg'", "D601:k3s:: script must map stdin to shell -s", routeScript); @@ -273,6 +278,28 @@ export function runSshArgvGuidanceContract(): JsonRecord { const crossChecks = providerTriageRecommendedCrossChecks("D601"); assertCondition(crossChecks.includes("bun scripts/cli.ts ssh D601 argv true"), "provider triage cross-checks must keep argv true", crossChecks); + const frontendRemoteK3sPlan = remoteSshFrontendPlanForTest("D601:k3s", ["kubectl", "get", "nodes", "-o", "name"]); + assertCondition(frontendRemoteK3sPlan.providerId === "D601", "remote frontend ssh must dispatch route target to the provider id", frontendRemoteK3sPlan); + assertCondition(frontendRemoteK3sPlan.remoteCommand === "'env' 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml' 'kubectl' 'get' 'nodes' '-o' 'name'", "remote frontend ssh must preserve k3s route command construction", frontendRemoteK3sPlan); + + const frontendRemotePodArgvPlan = remoteSshFrontendPlanForTest("G14:k3s:unidesk:code-queue", ["argv", "sh", "-c", "command -v tran"]); + assertCondition(frontendRemotePodArgvPlan.providerId === "G14", "remote frontend pod route must dispatch through G14 provider", frontendRemotePodArgvPlan); + assertCondition(frontendRemotePodArgvPlan.remoteCommand === "'env' 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml' 'kubectl' 'exec' '-n' 'unidesk' 'deployment/code-queue' '--' 'sh' '-c' 'command -v tran'", "remote frontend pod argv route must be fully assembled before dispatch", frontendRemotePodArgvPlan); + + const frontendRemoteWorkspacePlan = remoteSshFrontendPlanForTest("D601:/home/ubuntu/workspace/hwlab-dev", ["git", "status", "--short"]); + assertCondition(frontendRemoteWorkspacePlan.payloadCwd === "/home/ubuntu/workspace/hwlab-dev", "remote frontend host workspace route must pass cwd to host.ssh payload", frontendRemoteWorkspacePlan); + assertCondition(frontendRemoteWorkspacePlan.remoteCommand === "'git' 'status' '--short'", "remote frontend host workspace route must keep command argv-quoted", frontendRemoteWorkspacePlan); + + const tranScript = readFileSync(new URL("./tran", import.meta.url), "utf8"); + assertCondition(tranScript.includes("CODE_QUEUE_DEV_CONTAINER_MASTER_HOST") && tranScript.includes("--main-server-ip"), "tran wrapper must auto-select frontend transport inside Code Queue runner pods", tranScript); + assertCondition(tranScript.includes("UNIDESK_TRAN_LOCAL"), "tran wrapper must keep an explicit local override for diagnostics", tranScript); + + const remoteSource = readFileSync(new URL("./src/remote.ts", import.meta.url), "utf8"); + assertCondition(remoteSource.includes("UNIDESK_REMOTE_HTTP_CLIENT") && remoteSource.includes("isCodeQueueRunnerEnv(env) ? \"curl\" : \"fetch\""), "remote frontend transport must default to curl HTTP in Code Queue runner environments", remoteSource); + + const codeQueueDockerfile = readFileSync(new URL("../src/components/microservices/code-queue/Dockerfile", import.meta.url), "utf8"); + assertCondition(codeQueueDockerfile.includes("COPY scripts/tran /usr/local/bin/tran") && codeQueueDockerfile.includes("chmod 755 /usr/local/bin/tran"), "Code Queue runner image must install tran on PATH", codeQueueDockerfile); + return { ok: true, checks: [ @@ -286,6 +313,9 @@ export function runSshArgvGuidanceContract(): JsonRecord { "ssh-like timeout/kex failures emit one structured argv retry hint", "help text documents stdin script passthrough and UNIDESK_SSH_HINT", "provider triage recommendedCrossChecks keeps ssh D601 argv true", + "remote frontend ssh uses the same structured route parser for host, k3s and pod argv routes", + "Code Queue runner image installs the tran wrapper and runner tran auto-selects remote frontend transport", + "Code Queue runner remote frontend HTTP uses curl by default to avoid Bun response-body native crashes", ], }; } diff --git a/scripts/tran b/scripts/tran new file mode 100755 index 00000000..223227f4 --- /dev/null +++ b/scripts/tran @@ -0,0 +1,20 @@ +#!/bin/sh +set -eu + +repo=${UNIDESK_TRAN_REPO_ROOT:-/root/unidesk} +if [ ! -f "$repo/scripts/cli.ts" ]; then + self_dir=$(CDPATH= cd -- "$(dirname -- "$0")" && pwd) + repo=$(CDPATH= cd -- "$self_dir/.." && pwd) +fi + +host=${UNIDESK_MAIN_SERVER_IP:-${UNIDESK_MAIN_SERVER_HOST:-${CODE_QUEUE_DEV_CONTAINER_MASTER_HOST:-}}} +runner_env=0 +if [ -n "${CODE_QUEUE_SERVICE_ROLE:-}" ] || [ -n "${CODE_QUEUE_INSTANCE_ID:-}" ] || [ -n "${KUBERNETES_SERVICE_HOST:-}" ]; then + runner_env=1 +fi + +if [ "$runner_env" = 1 ] && [ -n "$host" ] && [ "${UNIDESK_TRAN_LOCAL:-}" != "1" ]; then + exec bun "$repo/scripts/cli.ts" --main-server-ip "$host" ssh "$@" +fi + +exec bun "$repo/scripts/cli.ts" ssh "$@" diff --git a/src/components/microservices/code-queue/Dockerfile b/src/components/microservices/code-queue/Dockerfile index da687475..9465a531 100644 --- a/src/components/microservices/code-queue/Dockerfile +++ b/src/components/microservices/code-queue/Dockerfile @@ -7,6 +7,9 @@ ENV RUSTUP_HOME=/usr/local/rustup ENV CARGO_HOME=/usr/local/cargo ENV PATH=/usr/local/cargo/bin:${PATH} +COPY scripts/tran /usr/local/bin/tran +RUN chmod 755 /usr/local/bin/tran + RUN (command -v docker >/dev/null 2>&1 && docker buildx version >/dev/null 2>&1 && command -v gh >/dev/null 2>&1 && command -v rg >/dev/null 2>&1 && command -v cargo >/dev/null 2>&1 && command -v rustc >/dev/null 2>&1 && command -v rustfmt >/dev/null 2>&1 && command -v xvfb-run >/dev/null 2>&1) \ || (apt-get update \ && apt-get install -y --no-install-recommends \