fix(cli): guide ssh argv usage

This commit is contained in:
Codex
2026-05-23 12:32:43 +00:00
parent ca1e2544f0
commit 15d074424c
8 changed files with 171 additions and 23 deletions
+6 -4
View File
@@ -19,7 +19,7 @@ CLI 可以从 `master` 快速演进,但必须兼容 `deploy.json` 固定的 CI
- `server cleanup plan [--min-age-hours N] [--limit N]` 只生成主 server Docker 镜像清理 dry-run 计划,不执行删除;默认 `--min-age-hours 24`,避免把刚发布或刚验证的镜像列为 stale。输出必须包含 `dryRun=true``mutation=false``policy.deletionExecuted=false`、active containers/images、受保护镜像、candidate stale images、估算释放空间、风险等级、`commandsToReview` 和人工审批清单。计划必须保守白名单:保留 running containers 使用的 image ID,保留 stopped containers 引用的 image ID 直到人工先复核容器,保留 `deploy.json`/`CI.json` 当前 commit-pinned artifact、Compose stable image、上游 digest pin 和 provider-gateway runner image`protectedStorage` 必须显式列出 PostgreSQL named volume、Baidu Netdisk `.state`、D601 registry storage 和 Docker volumes/host data policy。该入口禁止生成或执行 `docker system prune``docker image prune``docker builder prune``docker volume rm``docker compose down -v`、数据库清理或 host data `rm` 命令;未来若增加真实删除,必须另设显式审批参数并先复核 dry-run 输出。
- `server rebuild <backend-core|frontend|dev-frontend-proxy|provider-gateway|todo-note|code-queue-mgr|project-manager|baidu-netdisk|oa-event-flow>` 创建异步 job,先构建目标服务镜像,随后在 `.state/locks/server-compose.lock` 串行保护下用 `--no-deps --force-recreate` 替换目标 service 并等待容器 `healthy/running`;该命令用于替代手工删除容器的兜底流程,其中 `dev-frontend-proxy` 只更新主 server dev 入口薄代理,`todo-note``code-queue-mgr``project-manager``baidu-netdisk``oa-event-flow` 只重建主 server 承载的对应后端,不会重建或删除 database 命名卷。D601 Code Queue 执行面不由 `server rebuild` 管理,Rust backend-core 迭代不得用 `server rebuild backend-core` 在 master server 编译,规则见 `docs/reference/dev-environment.md`
- `provider attach <providerId> [--master-server URL] [--up] [--force]` 在新计算节点生成两项配置的 provider-gateway 挂载包:`.state/provider-<ID>.env` 默认只包含 `UNIDESK_MASTER_SERVER``PROVIDER_ID``provider-<ID>.yml` 固定 Docker socket、`pid: "host"``restart: always`、只读 `/workspace` 和 SSH 维护私钥挂载;`--up` 会立即执行生成的 `docker compose up -d --build``provider triage <providerId> [--observed-error text] [--observed-scope scope] [--microservice id ...] [--full|--raw]` 是只读多信号健康裁决入口,会把单路径 `provider is not online`、SSH 超时、registry 失败和 service proxy 失败归类成 `runner-local-observation-gap``service-degraded``provider-degraded``global-blocker`。默认输出只返回裁决、scope、失败/降级/未知信号和有界 evidence 摘要,完整 evidence 必须显式加 `--full``--raw`;推荐交叉验证命令仍包含 `debug health``debug dispatch <providerId> host.ssh --wait-ms 15000``ssh <providerId> argv true``artifact-registry health --provider-id <providerId>``microservice health k3sctl-adapter``microservice health code-queue``codex tasks --view supervisor --limit 20`
- `ssh <providerId> [ssh-like args...]` 通过 backend-core 内网 WebSocket broker 和 provider-gateway 的 Host SSH / WSL SSH 维护桥连接目标节点;无后续参数时进入远端登录 shell,有后续参数时按 ssh 远端命令体验执行并返回远端 exit code。
- `ssh <providerId> [ssh-like args...]` 通过 backend-core 内网 WebSocket broker 和 provider-gateway 的 Host SSH / WSL SSH 维护桥连接目标节点;无后续参数时进入远端登录 shell,有后续参数时按 ssh 远端命令体验执行并返回远端 exit code。非交互远端命令优先使用 `ssh <providerId> argv ...`,需要 shell 特性时用 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'`ssh-like 命令遇到 timeout/kex/255 类失败时,CLI 会在 stderr 追加一行 `UNIDESK_SSH_HINT` JSON,提示 argv 重试和 provider triage 交叉验证。
- `ssh <providerId> apply-patch [tool args...] < patch.diff` 直接调用远端注入的 `apply_patch` 工具,并把本地 stdin 中的标准 `*** Begin Patch` / `*** End Patch` patch 流透传给目标节点。
- `ssh <providerId> py [script-args...] < script.py` 把本地 stdin 落到远端临时 `.py` 文件后再以 `python3 -u` 执行并自动清理,避免再手写 `'python3 -'`、heredoc 或多层引号;`script-args` 会按 argv 安全透传给远端脚本。
- `ssh <providerId> skills [--scope all|wsl|windows] [--limit N]` 发现目标节点上的 WSL/Linux skill 根目录;当 provider 是 WSL 时同一次调用还会扫描 Windows 用户目录下的 `.agents/skills``.codex/skills`
@@ -110,12 +110,14 @@ GitHub issue/PR 写操作必须优先使用 `bun scripts/cli.ts gh issue|pr ...
`bun scripts/cli.ts ssh --help``bun scripts/cli.ts ssh <providerId> --help` 是本地 JSON 帮助命令,必须快速返回;不能把 `--help` 解析成 Provider ID,不能打开交互 shell,也不能等待 provider 会话。
`bun scripts/cli.ts ssh D518` 应表现为登录 D518 WSL 的 shell`bun scripts/cli.ts ssh D518 hostname` 应像 `ssh D518 hostname` 一样只输出远端命令结果并返回远端 exit code。Provider ID 前的目标选择由 UniDesk 节点清单决定,`-p``-i``-l``-o` 等传统 ssh 传输参数由 provider-gateway 部署配置统一管理,CLI 会兼容性消费这些参数但不会覆盖节点侧维护桥配置。
`bun scripts/cli.ts ssh D518` 应表现为登录 D518 WSL 的 shell`bun scripts/cli.ts ssh D518 hostname` 应像 `ssh D518 hostname` 一样只输出远端命令结果并返回远端 exit code。Provider ID 前的目标选择由 UniDesk 节点清单决定,`-p``-i``-l``-o` 等传统 ssh 传输参数由 provider-gateway 部署配置统一管理,CLI 会兼容性消费这些参数但不会覆盖节点侧维护桥配置。指挥官、CI 预检和其他非交互流程不要依赖 ssh-like 自由拼接;标准写法是 `bun scripts/cli.ts ssh D601 argv true`,或者在需要管道、变量展开和多条命令时使用 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'`
core 只允许声明了 `host.ssh` capability 的 provider 使用 `ssh` 透传或 `host.ssh` dispatch;旧 provider 不支持该能力时必须快速失败并输出错误,不能把未知命令误判成 `echo` 成功。
本地 broker 默认等待 provider SSH 会话打开 60000ms,以便在目标节点同时有较多 microservice.http 任务时仍能建立维护会话;需要诊断慢连接时可用 `UNIDESK_SSH_OPEN_TIMEOUT_MS=<ms>` 临时调大,但最小有效值固定为 15000ms,避免把真实离线误判为长时间阻塞。
ssh-like 远端命令如果出现 `kex_exchange_identification``Connection closed by remote host`、provider session timeout 或 exit code 255CLI 会在原始 stderr 后追加一行 `UNIDESK_SSH_HINT { ... }`。该 JSON 不回显原始远端命令,只包含 `code=ssh-like-command-friction``trigger``try``triage``try` 固定指向 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'` 形态,避免把一次 ssh-like 解析/握手摩擦误读成 D601 SSH 整体不可用。
`ssh <providerId>` 会在远端会话启动时注入 `/tmp/unidesk-ssh-tools/apply_patch``/tmp/unidesk-ssh-tools/glob``/tmp/unidesk-ssh-tools/skill-discover`,并把该目录加入远端 `PATH``apply_patch` 接受标准 `*** Begin Patch` / `*** End Patch` patch 格式,便于通过 SSH 透传编辑远端仓库文件;`glob` 在远端用 Python 执行路径匹配,避免依赖 shell glob 展开;`skill-discover` 用于列出远端 Linux/WSL 与 Windows skill。目标节点需要具备 `python3``base64`。注入工具只写 `/tmp/unidesk-ssh-tools`,不修改目标仓库,交互式 shell 和远端命令都可以直接调用这些工具。
如果只是远端打小补丁,不需要再手写 `ssh D601 'apply_patch' < patch.diff` 这种命令拼接;正式入口是 `bun scripts/cli.ts ssh D601 apply-patch < patch.diff``apply-patch``patch` 等价,附加参数会原样透传给远端 `apply_patch`,例如 `bun scripts/cli.ts ssh D601 apply-patch --help`。标准单命令用法如下,不需要先创建本地 patch 临时文件:
@@ -160,7 +162,7 @@ bun scripts/cli.ts ssh D601 find /home/ubuntu --max-depth 4 --type d --icontains
bun scripts/cli.ts ssh D601 glob --root /home/ubuntu/pikapython --pattern '**/*-test.cpp' --limit 20 --sort
```
`ssh <providerId> argv <command> [args...]` 是通用 argv 安全拼接入口;`exec` 是同义入口。它适合不需要 shell 管道的常用命令`find``glob``apply-patch` 有专用入口;`rg``grep``sed``nl``stat``du``ls``cat``head``tail``wc``pwd` 可以直接作为 `ssh` 子命令使用,CLI 会对每个 argv token 做 shell quoting。需要管道、重定向、变量展开或多条命令时仍使用旧的自由远端命令入口,并把整段远端 shell 脚本作为一个本地参数传入
`ssh <providerId> argv <command> [args...]` 是通用 argv 安全拼接入口;`exec` 是同义入口。它是非交互远端命令的默认成功路径,不需要 shell 管道时直接传命令和参数,例如 `bun scripts/cli.ts ssh D601 argv true`;需要管道、重定向、变量展开或多条命令时使用 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'`,让 shell 脚本作为 `bash -lc` 的一个 argv token 传递`find``glob``apply-patch` 有专用入口;`rg``grep``sed``nl``stat``du``ls``cat``head``tail``wc``pwd` 可以直接作为 `ssh` 子命令使用,CLI 会对每个 argv token 做 shell quoting。旧的自由 ssh-like 远端命令入口只保留为近似原生 ssh 的人工兼容路径
通过 `ssh <providerId>` 执行多行脚本时,优先使用结构化 helper,例如 `bun scripts/cli.ts ssh D601 py < script.py``printf ... | (bun scripts/cli.ts ssh D601 'bash -s')` 这种单层 stdin 传输。不要在远端命令字符串里再嵌套 heredoc、复杂引号或 `ssh 'python3 - <<EOF ...'` 形态;多层 shell 解析容易把 stdin 绑定到错误进程,结果会打开远端交互解释器并留下悬挂的 broker/SSH 会话。长脚本需要复用时,优先通过 stdin 写入目标节点的临时脚本,再在同一个远端命令中显式执行并清理。
@@ -168,7 +170,7 @@ bun scripts/cli.ts ssh D601 glob --root /home/ubuntu/pikapython --pattern '**/*-
`--main-server-ip` 是一个全局前缀,必须放在需要透传的命令同一次调用中,例如 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug health`。默认传输是公网 frontend:本地 CLI 读取本仓库 `config.json` 中的 frontend 登录账号密码,登录 `http://<ip>:<frontendPort>/` 获取 HttpOnly session cookie,然后通过 frontend 的 `/api/*` 同源代理访问 backend-core 内网 API;因此计算节点只需要能访问公网 frontend,不需要主 server SSH key,也不需要打开 backend-core REST API 或 PostgreSQL 端口。
默认 frontend 传输支持 `debug health``debug dispatch``debug task``artifact-registry status|health``ci publish-user-service --dry-run``microservice list/status/health/diagnostics/tunnel-self-test/proxy``decision upload/list/show/health``decision requirement list/upsert``decision diary import/list/history/months/show/edit/upsert``codex task <taskId>``codex tasks``codex unread``codex queues``codex output <taskId>``codex judge <taskId> --attempt N``ssh <PROVIDER_ID> <remote-command>``microservice status/health/diagnostics` 经 frontend 远程传输时也复用本地 CLI 的默认 compact summary`microservice health code-queue` 只有显式 `--raw``--full` 才返回完整健康 body。运行中纠偏 `codex steer` 属于 active run write control,应在主 server 本机 CLI 或显式 SSH 传输上执行,避免公网 frontend 透传限制 stdin/body 审计语义。其中 `ssh` 的 remote frontend 传输使用 `host.ssh` dispatch 执行有界远端命令,适合 `ssh D601 hostname` `ssh D601 skills` 这类自测;交互式登录 shell 仍应在主 server 本机 CLI 使用,或显式切换到旧 SSH 传输后在主 server 上执行。frontend 远程透传不会流式转发本地 stdin,因此 `ssh py < script.py``ssh apply-patch < patch.diff` 这类 stdin-backed helper 必须在主 server 本机运行,或显式切换到 `--main-server-transport ssh`。当 backend-core、database、provider-dispatch 或 provider-host-ssh 缺失时,这些 read-only 预检必须返回结构化 `runnerDisposition=infra-blocked` 和缺失通道列表,而不是裸 `No such container`。若确实需要旧行为,可使用 `--main-server-key <key>``--main-server-transport ssh`,这时 CLI 会通过 SSH 登录主 server 的 `--main-server-root` 目录执行同一个 `bun scripts/cli.ts <command>`
默认 frontend 传输支持 `debug health``debug dispatch``debug task``artifact-registry status|health``ci publish-user-service --dry-run``microservice list/status/health/diagnostics/tunnel-self-test/proxy``decision upload/list/show/health``decision requirement list/upsert``decision diary import/list/history/months/show/edit/upsert``codex task <taskId>``codex tasks``codex unread``codex queues``codex output <taskId>``codex judge <taskId> --attempt N``ssh <PROVIDER_ID> <remote-command>``microservice status/health/diagnostics` 经 frontend 远程传输时也复用本地 CLI 的默认 compact summary`microservice health code-queue` 只有显式 `--raw``--full` 才返回完整健康 body。运行中纠偏 `codex steer` 属于 active run write control,应在主 server 本机 CLI 或显式 SSH 传输上执行,避免公网 frontend 透传限制 stdin/body 审计语义。其中 `ssh` 的 remote frontend 传输使用 `host.ssh` dispatch 执行有界远端命令,非交互命令同样优先 `ssh D601 argv true` `ssh D601 argv bash -lc '<command>'`;交互式登录 shell 仍应在主 server 本机 CLI 使用,或显式切换到旧 SSH 传输后在主 server 上执行。frontend 远程透传不会流式转发本地 stdin,因此 `ssh py < script.py``ssh apply-patch < patch.diff` 这类 stdin-backed helper 必须在主 server 本机运行,或显式切换到 `--main-server-transport ssh`。当 backend-core、database、provider-dispatch 或 provider-host-ssh 缺失时,这些 read-only 预检必须返回结构化 `runnerDisposition=infra-blocked` 和缺失通道列表,而不是裸 `No such container`。若确实需要旧行为,可使用 `--main-server-key <key>``--main-server-transport ssh`,这时 CLI 会通过 SSH 登录主 server 的 `--main-server-root` 目录执行同一个 `bun scripts/cli.ts <command>`
计算节点可以用该入口测试自身的远程升级闭环,而不需要在计算节点公开 core REST API 或 database。标准顺序是:先运行 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug health` 确认主 server 看到当前 Provider 在线,且该 Provider labels 中 `unideskCapabilities` 包含 `host.ssh``hostSshConfigured=true``hostSshKeyPresent=true`;再运行 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> provider.upgrade --mode schedule --wait-ms 15000` 触发真实 `provider.upgrade`;随后再次运行 `debug health` 确认节点重新上线;最后运行 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000``bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh <PROVIDER_ID> hostname` 验证 SSH 透传能力。provider-gateway 新部署或升级后没有完成这组 remote CLI 自测,不能视为交付完成。
+4 -2
View File
@@ -142,6 +142,8 @@ backend-core 可以通过真实 WebSocket 调度向在线 provider 下发 `provi
`bun scripts/cli.ts provider triage <PROVIDER_ID>` 是 provider 运行状态的只读多信号裁决入口。输出必须包含 `decision``retryable``healthyScopes``failedScopes``degradedScopes``blockingDisposition``rationale``signals``recommendedCrossChecks``decision` 的长期语义是:`global-offline` 表示 provider heartbeat、Host SSH、k3s 或 scheduler 等多个独立关键面同时失败且没有健康交叉证据;`service-degraded` 表示 registry、service proxy 或单个用户服务局部退化但仍存在 provider 级健康信号;`retryable-transient` 表示单次 runner-local、SSH、proxy 或 API timeout 证据不足,应重试或补交叉验证;`healthy` 表示未观察到失败或退化信号。
`recommendedCrossChecks` 必须保留 argv 形态的 Host SSH 自检:`bun scripts/cli.ts ssh <PROVIDER_ID> argv true`。这条命令用于证明非交互维护桥仍可用;如果自由 ssh-like 形态出现 timeout、`kex_exchange_identification``Connection closed by remote host`,应先按 CLI 输出的 `UNIDESK_SSH_HINT` 改用 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'` 复测,再结合 `provider triage` 判断是否真是 provider 级故障。
D601 这类长期 WSL provider 不得因为单一路径失败被直接写成全局离线。典型局部退化包括 artifact registry 的 `unidesk-artifact-registry.service` inactive,但 registry container 仍 running、listener 仍绑定 loopback、`http://127.0.0.1:5000/v2/` 返回 200;这种状态应在 registry scope 内显示 degraded,并在 provider triage 中落到 `decision=service-degraded`,只提示修复 systemd drift,不阻断所有 D601 上的 Code Queue、k3sctl-adapter 或业务 API 判断。
## Manual Upgrade Maintenance
@@ -162,6 +164,6 @@ WSL provider 需要调用 Windows-only 工具链时,优先在 WSL 用户的 `~
维护桥通过真实 WebSocket dispatch 暴露为 `host.ssh` 命令。默认 payload 使用 `mode: "probe"`,远端只执行一个短命令并返回 `UNIDESK_SSH_TEST user=... host=... bridge=host.ssh cwd=...`;需要人工诊断时可以显式使用 `mode: "exec"``command` 字段执行有界命令。所有 `host.ssh` 执行都必须有超时,stdout/stderr 在 task result 中截断展示;自动升级和普通任务仍必须使用 Docker socket 与 `provider.upgrade`,不得把 WSL SSH 维护桥当成调度通道。
面向人的终端入口是 `bun scripts/cli.ts ssh <PROVIDER_ID> [ssh-like args...]`。无后续参数时打开远端登录 shell,有后续参数时执行远端命令并返回远端 exit code;该入口走 backend-core 内网 `/ws/ssh` broker 和 provider 既有 WebSocket,不新增公网 core 端口。传统 ssh 传输参数由 provider-gateway 环境变量统一控制,CLI 只负责把 Provider ID 后的远端命令和终端 stdin/stdout/stderr 透传过去。WSL 节点需要同时看清 Linux/WSL 与 Windows 两套 skill 时,使用 `bun scripts/cli.ts ssh <PROVIDER_ID> skills`,该命令只通过已建立的维护桥读取 `SKILL.md` 元数据,不要求 provider-gateway 新增业务 API。
面向人的终端入口是 `bun scripts/cli.ts ssh <PROVIDER_ID> [ssh-like args...]`。无后续参数时打开远端登录 shell,有后续参数时执行远端命令并返回远端 exit code;该入口走 backend-core 内网 `/ws/ssh` broker 和 provider 既有 WebSocket,不新增公网 core 端口。传统 ssh 传输参数由 provider-gateway 环境变量统一控制,CLI 只负责把 Provider ID 后的远端命令和终端 stdin/stdout/stderr 透传过去。非交互远端命令优先使用 argv 入口:`bun scripts/cli.ts ssh D601 argv true`,或需要 shell 特性时使用 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'`WSL 节点需要同时看清 Linux/WSL 与 Windows 两套 skill 时,使用 `bun scripts/cli.ts ssh <PROVIDER_ID> skills`,该命令只通过已建立的维护桥读取 `SKILL.md` 元数据,不要求 provider-gateway 新增业务 API。
验证 WSL SSH 桥时,先在目标 WSL 中启动 sshd 并确保维护公钥写入目标用户的 `authorized_keys`,再确认目标 provider 注册 labels 中 `unideskCapabilities` 包含 `host.ssh`。运行 `bun scripts/cli.ts debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000` 后,结果应在 `debug task latest` 或前端任务历史中显示 `status: succeeded``probeLine``UNIDESK_SSH_TEST``exitCode: 0`,并且目标节点 labels 中 `hostSshKeyPresent` 为 true;随后运行 `bun scripts/cli.ts ssh <PROVIDER_ID> hostname` 验证近似原生 ssh 的远端命令体验。在计算节点本机自测时,使用 remote CLI 透传同一组命令:`bun scripts/cli.ts --main-server-ip 74.48.78.17 debug health``bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000``bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh <PROVIDER_ID> hostname`;默认 remote CLI 走公网 frontend 登录态,不需要主 server SSH key。健康检查必须能看到该 Provider 在线、`hostSshConfigured=true``hostSshKeyPresent=true``hostSshTarget` 正确、`unideskCapabilities` 包含 `host.ssh`probe 必须返回 `UNIDESK_SSH_TEST``ssh <PROVIDER_ID> hostname` 必须输出目标 WSL/宿主 hostname exit code 为 0。如果 D518 这类 WSL 节点没有公网 SSH 入口,也必须通过这个 provider-gateway 自连维护桥完成验证,而不是要求主 server 直接连节点公网 22 端口;旧版 provider 未声明 `host.ssh` 时必须先升级 provider-gateway,否则 core 会拒绝 SSH 透传。
验证 WSL SSH 桥时,先在目标 WSL 中启动 sshd 并确保维护公钥写入目标用户的 `authorized_keys`,再确认目标 provider 注册 labels 中 `unideskCapabilities` 包含 `host.ssh`。运行 `bun scripts/cli.ts debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000` 后,结果应在 `debug task latest` 或前端任务历史中显示 `status: succeeded``probeLine``UNIDESK_SSH_TEST``exitCode: 0`,并且目标节点 labels 中 `hostSshKeyPresent` 为 true;随后运行 `bun scripts/cli.ts ssh <PROVIDER_ID> argv true` 验证非交互 argv 维护命令,再运行 `bun scripts/cli.ts ssh <PROVIDER_ID> hostname` 验证近似原生 ssh 的远端命令体验。在计算节点本机自测时,使用 remote CLI 透传同一组命令:`bun scripts/cli.ts --main-server-ip 74.48.78.17 debug health``bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000``bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh <PROVIDER_ID> argv true``bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh <PROVIDER_ID> hostname`;默认 remote CLI 走公网 frontend 登录态,不需要主 server SSH key。健康检查必须能看到该 Provider 在线、`hostSshConfigured=true``hostSshKeyPresent=true``hostSshTarget` 正确、`unideskCapabilities` 包含 `host.ssh`probe 必须返回 `UNIDESK_SSH_TEST``ssh <PROVIDER_ID> argv true``ssh <PROVIDER_ID> hostname` 必须 exit code 为 0。如果 D518 这类 WSL 节点没有公网 SSH 入口,也必须通过这个 provider-gateway 自连维护桥完成验证,而不是要求主 server 直接连节点公网 22 端口;旧版 provider 未声明 `host.ssh` 时必须先升级 provider-gateway,否则 core 会拒绝 SSH 透传。
@@ -1,4 +1,4 @@
import { buildProviderTriageResult, type ProviderTriageSignal } from "./src/provider-triage";
import { buildProviderTriageResult, providerTriageRecommendedCrossChecks, type ProviderTriageSignal } from "./src/provider-triage";
import { codexTaskQuery } from "./src/code-queue";
import { classifyRunnerError } from "../src/components/microservices/code-queue/src/runner-error-classifier";
@@ -58,6 +58,11 @@ export function runProviderRunnerTriageContract(): JsonRecord {
assertCondition(result.decision === "retryable-transient", "single path provider offline should be retryable transient", result);
assertCondition(result.retryable === true, "single path provider offline should be retryable", result);
assertCondition(result.contract.singlePathProviderOfflineIsGlobalBlocker === false, "triage contract should reject single-path global blocker", result);
assertCondition(result.recommendedCrossChecks.includes("bun scripts/cli.ts ssh D601 argv true"), "provider triage result must recommend argv Host SSH cross-check", result.recommendedCrossChecks);
const crossChecks = providerTriageRecommendedCrossChecks("D601");
assertCondition(crossChecks.includes("bun scripts/cli.ts ssh D601 argv true"), "provider triage recommendedCrossChecks must keep ssh argv true", crossChecks);
assertCondition(crossChecks.includes("bun scripts/cli.ts debug dispatch D601 host.ssh --wait-ms 15000"), "provider triage recommendedCrossChecks must keep host.ssh dispatch probe", crossChecks);
const rateLimitTriage = buildProviderTriageResult("D601", [
signal("observed-runner-429", "external-provider", "failed", false),
@@ -102,6 +107,7 @@ export function runProviderRunnerTriageContract(): JsonRecord {
"runner error classifier separates runner-local/provider-gateway/registry/k3s/scheduler/unknown",
"each single runner error classification has globalBlocker=false",
"provider triage keeps single provider is not online as retryable-transient, not global-blocker",
"provider triage recommendedCrossChecks keeps host.ssh dispatch and ssh argv true probes",
"external OpenAI/model provider 429 is explicit retryable backoff evidence, not Code Queue infra outage",
"codex task --detail preserves runnerErrorClassification in compact attempt output",
],
+4
View File
@@ -45,6 +45,7 @@ const syntaxFiles = [
"scripts/microservice-health-output-contract-test.ts",
"scripts/code-queue-supervisor-disclosure-contract-test.ts",
"scripts/code-queue-commander-view-contract-test.ts",
"scripts/ssh-argv-guidance-contract-test.ts",
"src/components/frontend/src/index.ts",
"src/components/frontend/src/app.tsx",
"src/components/frontend/src/decision-center.tsx",
@@ -334,6 +335,7 @@ export function runChecks(config: UniDeskConfig, options: CheckOptions = default
fileItem("scripts/host-codex-commander-skeleton-contract-test.ts"),
fileItem("scripts/host-codex-commander-no-daemon-smoke-contract-test.ts"),
fileItem("scripts/provider-runner-triage-contract-test.ts"),
fileItem("scripts/ssh-argv-guidance-contract-test.ts"),
fileItem("scripts/src/provider-triage.ts"),
fileItem("src/components/microservices/code-queue/src/runner-error-classifier.ts"),
fileItem("scripts/src/ci.ts"),
@@ -384,6 +386,7 @@ export function runChecks(config: UniDeskConfig, options: CheckOptions = default
items.push(commandItem("host-codex-commander:skeleton-contract", ["bun", "scripts/host-codex-commander-skeleton-contract-test.ts"], 30_000));
items.push(commandItem("host-codex-commander:no-daemon-smoke-contract", ["bun", "scripts/host-codex-commander-no-daemon-smoke-contract-test.ts"], 30_000));
items.push(commandItem("provider:runner-triage-contract", ["bun", "scripts/provider-runner-triage-contract-test.ts"], 30_000));
items.push(commandItem("ssh:argv-guidance-contract", ["bun", "scripts/ssh-argv-guidance-contract-test.ts"], 30_000));
items.push(commandItem("deploy:artifact-matrix-contract", ["bun", "scripts/deploy-artifact-matrix-contract-test.ts"], 30_000));
items.push(commandItem("decision-center:desired-state-contract", ["bun", "scripts/decision-center-desired-state-contract-test.ts"], 30_000));
items.push(commandItem("code-queue:active-run-heartbeat-visible", ["bun", "scripts/code-queue-liveness-diagnostics-test.ts", "--only", "code-queue:active-run-heartbeat-visible"], 30_000));
@@ -423,6 +426,7 @@ export function runChecks(config: UniDeskConfig, options: CheckOptions = default
items.push(skippedItem("host-codex-commander:skeleton-contract", "host Codex commander skeleton contract is opt-in with script checks", "--scripts-typecheck or --full"));
items.push(skippedItem("host-codex-commander:no-daemon-smoke-contract", "host Codex commander no-daemon smoke contract is opt-in with script checks", "--scripts-typecheck or --full"));
items.push(skippedItem("provider:runner-triage-contract", "Provider runner triage contract is opt-in with script checks", "--scripts-typecheck or --full"));
items.push(skippedItem("ssh:argv-guidance-contract", "SSH argv guidance and failure hint contract is opt-in with script checks", "--scripts-typecheck or --full"));
items.push(skippedItem("deploy:artifact-matrix-contract", "deploy artifact matrix contract is opt-in with script checks", "--scripts-typecheck or --full"));
items.push(skippedItem("decision-center:desired-state-contract", "Decision Center desired-state drift contract is opt-in with script checks", "--scripts-typecheck or --full"));
items.push(skippedItem("code-queue:liveness-diagnostics-fixtures", "Code Queue liveness diagnostics fixtures are opt-in with script checks", "--scripts-typecheck or --full"));
+5 -3
View File
@@ -18,13 +18,13 @@ export function rootHelp(): unknown {
{ command: "server cleanup plan [--min-age-hours N] [--limit N]", description: "Dry-run Docker image cleanup plan only: list active/protected images, stale candidates older than the default 24h threshold, risk, estimated reclaim, and manual review commands without deleting anything." },
{ command: "server rebuild <backend-core|frontend|dev-frontend-proxy|provider-gateway|todo-note|code-queue-mgr|project-manager|baidu-netdisk|oa-event-flow>", description: "Maintenance-only local Compose rebuild for reviewed main-server services; frontend standard release must use CI artifact plus deploy apply dev/prod artifact consumers." },
{ command: "provider attach <providerId> [--master-server URL] [--up] [--force] | provider triage <providerId> [--observed-error text] [--observed-scope scope] [--microservice id ...] [--full|--raw]", description: "Generate the minimal external provider-gateway env/compose bundle or run the low-noise read-only provider health triage contract." },
{ command: "ssh <providerId> [ssh-like args...]", description: "Open a Host SSH / WSL SSH maintenance session through the provider-gateway bridge with built-in remote helper tools in PATH." },
{ command: "ssh <providerId> [ssh-like args...]", description: "Open a Host SSH / WSL SSH maintenance session through the provider-gateway bridge; prefer `ssh <providerId> argv ...` for non-interactive remote commands." },
{ command: "ssh <providerId> apply-patch [tool args...] < patch.diff", description: "Invoke the injected remote apply_patch helper directly over SSH passthrough and stream the patch from local stdin." },
{ command: "ssh <providerId> py [script-args...] < script.py", description: "Run remote Python from local stdin through SSH passthrough without nested shell quoting; extra args become script argv." },
{ command: "ssh <providerId> skills [--scope all|wsl|windows] [--limit N]", description: "Discover WSL/Linux and, for WSL providers, Windows skill directories in one SSH passthrough call." },
{ command: "ssh <providerId> find <path...> [--max-depth N] [--type d|f|l] [--contains TEXT] [--iname PATTERN] [--limit N] [--sort]", description: "Run a structured remote find command without nested shell quoting or parentheses." },
{ command: "ssh <providerId> glob [--root DIR] [--pattern PATTERN] [--contains TEXT] [--type any|f|d] [--limit N] [--sort]", description: "Run remote glob matching through the injected helper without shell glob expansion." },
{ command: "ssh <providerId> argv <command> [args...]", description: "Run a remote command with each argv token shell-quoted by UniDesk before SSH passthrough." },
{ command: "ssh <providerId> argv <command> [args...]", description: "Run a non-interactive remote command with each argv token shell-quoted by UniDesk before SSH passthrough; use `argv bash -lc '<command>'` when shell features are required." },
{ command: "microservice list", description: "List UniDesk-managed user services and their provider/runtime mapping." },
{ command: "microservice status <id>", description: "Show one user service config, repository reference, backend mapping, and runtime status." },
{ command: "microservice health <id> [--compact|--raw|--full]", description: "Probe one user service through backend-core -> provider-gateway HTTP proxy; default output is compact, and code-queue uses a commander-safe liveness summary unless raw/full is requested." },
@@ -142,6 +142,7 @@ export function sshHelp(): unknown {
usage: [
"bun scripts/cli.ts ssh <providerId>",
"bun scripts/cli.ts ssh <providerId> argv <command> [args...]",
"bun scripts/cli.ts ssh D601 argv bash -lc '<command>'",
"bun scripts/cli.ts ssh <providerId> apply-patch < patch.diff",
"bun scripts/cli.ts ssh <providerId> py [script-args...] < script.py",
"bun scripts/cli.ts ssh <providerId> skills [--scope all|wsl|windows] [--limit N]",
@@ -150,7 +151,8 @@ export function sshHelp(): unknown {
],
notes: [
"ssh --help and ssh <providerId> --help print this JSON help and never open an interactive session.",
"Use argv when nested shell quoting would be fragile.",
"For non-interactive remote commands, prefer argv: bun scripts/cli.ts ssh D601 argv bash -lc '<command>'.",
"If an ssh-like remote command fails with timeout/kex/exit-255 friction, stderr includes one low-noise UNIDESK_SSH_HINT JSON line with the argv retry command.",
"Use -- before a remote command that intentionally starts with a dash.",
],
};
+5 -2
View File
@@ -3,7 +3,7 @@ import { type UniDeskConfig } from "./config";
import { type DebugDispatchCommand, isDebugDispatchCommand } from "./debug";
import { summarizeMicroserviceHealthResponse, summarizeMicroserviceObservation, summarizeMicroserviceProxyResponse } from "./microservices";
import { parseNetworkPerfOptions, runNetworkPerf } from "./network-perf";
import { isSshSkillDiscoveryArgs, parseSshArgs } from "./ssh";
import { formatSshFailureHint, isSshSkillDiscoveryArgs, parseSshArgs, sshFailureHint } from "./ssh";
import { codexJudgeQueryAsync, codexOutputQueryAsync, codexPrPreflightQueryAsync, codexQueuesQueryAsync, codexTaskQueryAsync, codexTasksQueryAsync, codexUnreadTriageAsync } from "./code-queue";
import { runDecisionCenterCommandAsync } from "./decision-center";
import {
@@ -862,7 +862,10 @@ async function runRemoteSshOverFrontend(session: FrontendSession, providerId: st
if (stderr.length > 0) process.stderr.write(stderr);
if (task?.status !== "succeeded") {
if (stdout.length === 0 && stderr.length === 0) process.stderr.write(`${JSON.stringify({ taskId, task }, null, 2)}\n`);
return typeof result.exitCode === "number" ? result.exitCode : 255;
const exitCode = typeof result.exitCode === "number" ? result.exitCode : 255;
const hint = sshFailureHint(providerId, parsed, exitCode, stderr.length > 0 ? stderr : String(task?.message ?? ""));
if (hint !== null) process.stderr.write(formatSshFailureHint(hint));
return exitCode;
}
return typeof result.exitCode === "number" ? result.exitCode : 0;
}
+86 -11
View File
@@ -4,6 +4,20 @@ import { type UniDeskConfig, repoRoot } from "./config";
export interface ParsedSshArgs {
remoteCommand: string | null;
requiresStdin: boolean;
invocationKind: SshInvocationKind;
}
export type SshInvocationKind = "interactive" | "argv" | "helper" | "ssh-like";
export interface SshFailureHint {
code: "ssh-like-command-friction";
providerId: string;
trigger: "timeout-or-kex" | "exit-255";
exitCode: number;
message: string;
try: string;
triage: string;
note: string;
}
const argvQuotedSshSubcommands = new Set(["rg", "grep", "sed", "nl", "stat", "du", "ls", "cat", "head", "tail", "wc", "pwd"]);
@@ -479,28 +493,28 @@ export function parseSshArgs(args: string[]): ParsedSshArgs {
const subcommand = args[0] ?? "";
if (isSshSkillDiscoveryArgs(args)) {
const toolArgs = subcommand === "skill" ? ["skill-discover", ...args.slice(2)] : ["skill-discover", ...args.slice(1)];
return { remoteCommand: shellArgv(toolArgs), requiresStdin: false };
return { remoteCommand: shellArgv(toolArgs), requiresStdin: false, invocationKind: "helper" };
}
if (subcommand === "apply-patch" || subcommand === "patch") {
const toolArgs = ["apply_patch", ...args.slice(1)];
return { remoteCommand: shellArgv(toolArgs), requiresStdin: true };
return { remoteCommand: shellArgv(toolArgs), requiresStdin: true, invocationKind: "helper" };
}
if (subcommand === "py") {
return { remoteCommand: buildPythonStdinCommand(args.slice(1)), requiresStdin: true };
return { remoteCommand: buildPythonStdinCommand(args.slice(1)), requiresStdin: true, invocationKind: "helper" };
}
if (subcommand === "argv" || subcommand === "exec") {
const toolArgs = args.slice(1);
if (toolArgs.length === 0) throw new Error(`ssh ${subcommand} requires a command`);
return { remoteCommand: shellArgv(toolArgs), requiresStdin: false };
return { remoteCommand: shellArgv(toolArgs), requiresStdin: false, invocationKind: "argv" };
}
if (subcommand === "find") {
return { remoteCommand: buildFindCommand(args.slice(1)), requiresStdin: false };
return { remoteCommand: buildFindCommand(args.slice(1)), requiresStdin: false, invocationKind: "helper" };
}
if (subcommand === "glob") {
return { remoteCommand: shellArgv(["glob", ...args.slice(1)]), requiresStdin: false };
return { remoteCommand: shellArgv(["glob", ...args.slice(1)]), requiresStdin: false, invocationKind: "helper" };
}
if (argvQuotedSshSubcommands.has(subcommand)) {
return { remoteCommand: shellArgv(args), requiresStdin: false };
return { remoteCommand: shellArgv(args), requiresStdin: false, invocationKind: "argv" };
}
const remote: string[] = [];
let remoteStarted = false;
@@ -521,7 +535,11 @@ export function parseSshArgs(args: string[]): ParsedSshArgs {
remoteStarted = true;
remote.push(arg);
}
return { remoteCommand: remote.length === 0 ? null : remote.join(" "), requiresStdin: false };
return {
remoteCommand: remote.length === 0 ? null : remote.join(" "),
requiresStdin: false,
invocationKind: remote.length === 0 ? "interactive" : "ssh-like",
};
}
function shellArgv(args: string[]): string {
@@ -658,6 +676,48 @@ export function wrapSshRemoteCommand(command: string | null): string {
return `${bootstrap}; stty -echo 2>/dev/null || true; ${command}`;
}
function safeProviderId(providerId: string): string {
return /^[A-Za-z0-9_.-]{1,64}$/u.test(providerId) ? providerId : "<provider>";
}
function classifySshLikeFailure(exitCode: number, stderrText: string): SshFailureHint["trigger"] | null {
const normalized = stderrText.toLowerCase();
if (
normalized.includes("kex_exchange_identification")
|| normalized.includes("ssh_exchange_identification")
|| normalized.includes("connection closed by remote host")
|| normalized.includes("connection reset by peer")
|| normalized.includes("connection timed out")
|| normalized.includes("operation timed out")
|| normalized.includes("timed out waiting for provider session")
|| normalized.includes("the operation was aborted")
) {
return "timeout-or-kex";
}
return exitCode === 255 ? "exit-255" : null;
}
export function sshFailureHint(providerId: string, parsed: ParsedSshArgs, exitCode: number, stderrText: string): SshFailureHint | null {
if (parsed.invocationKind !== "ssh-like") return null;
const trigger = classifySshLikeFailure(exitCode, stderrText);
if (trigger === null) return null;
const shownProviderId = safeProviderId(providerId);
return {
code: "ssh-like-command-friction",
providerId: shownProviderId,
trigger,
exitCode,
message: "ssh-like remote command failed before proving Host SSH is globally unavailable; prefer argv form for non-interactive commands.",
try: `bun scripts/cli.ts ssh ${shownProviderId} argv bash -lc '<command>'`,
triage: `bun scripts/cli.ts provider triage ${shownProviderId} --observed-scope ssh --observed-error '<ssh-like timeout or kex failure>'`,
note: "This hint intentionally does not echo the original remote command.",
};
}
export function formatSshFailureHint(hint: SshFailureHint): string {
return `UNIDESK_SSH_HINT ${JSON.stringify(hint)}\n`;
}
function brokerSource(): string {
return String.raw`
const open = JSON.parse(process.argv[2] || process.argv[1] || "{}");
@@ -831,8 +891,16 @@ export async function runSsh(config: UniDeskConfig, providerId: string, args: st
if (rawMode) process.stdin.setRawMode(true);
process.stdin.resume();
process.stdin.pipe(child.stdin);
let stderrTail = "";
const appendStderrTail = (chunk: Buffer | string): void => {
const text = Buffer.isBuffer(chunk) ? chunk.toString("utf8") : chunk;
stderrTail = (stderrTail + text).slice(-16_384);
};
child.stdout.pipe(process.stdout);
child.stderr.pipe(process.stderr);
child.stderr.on("data", (chunk: Buffer) => {
appendStderrTail(chunk);
process.stderr.write(chunk);
});
return await new Promise<number>((resolve) => {
const restore = (): void => {
@@ -841,12 +909,19 @@ export async function runSsh(config: UniDeskConfig, providerId: string, args: st
};
child.on("error", (error) => {
restore();
process.stderr.write(`unidesk ssh failed to start broker: ${error.message}\n`);
const message = `unidesk ssh failed to start broker: ${error.message}\n`;
appendStderrTail(message);
process.stderr.write(message);
const hint = sshFailureHint(providerId, parsed, 255, stderrTail);
if (hint !== null) process.stderr.write(formatSshFailureHint(hint));
resolve(255);
});
child.on("close", (code) => {
restore();
resolve(code ?? 255);
const exitCode = code ?? 255;
const hint = sshFailureHint(providerId, parsed, exitCode, stderrTail);
if (hint !== null) process.stderr.write(formatSshFailureHint(hint));
resolve(exitCode);
});
});
}
@@ -0,0 +1,54 @@
import { sshHelp } from "./src/help";
import { providerTriageRecommendedCrossChecks } from "./src/provider-triage";
import { formatSshFailureHint, parseSshArgs, sshFailureHint } from "./src/ssh";
type JsonRecord = Record<string, unknown>;
function assertCondition(condition: unknown, message: string, detail: unknown = {}): void {
if (!condition) throw new Error(`${message}: ${JSON.stringify(detail)}`);
}
export function runSshArgvGuidanceContract(): JsonRecord {
const argv = parseSshArgs(["argv", "bash", "-lc", "echo ok"]);
assertCondition(argv.invocationKind === "argv", "argv subcommand must be classified as argv", argv);
assertCondition(argv.remoteCommand === "'bash' '-lc' 'echo ok'", "argv command must shell-quote each token", argv);
assertCondition(argv.requiresStdin === false, "argv command must not require stdin", argv);
assertCondition(sshFailureHint("D601", argv, 255, "kex_exchange_identification: Connection closed by remote host") === null, "argv failures must not produce ssh-like friction hint", argv);
const shortcut = parseSshArgs(["pwd"]);
assertCondition(shortcut.invocationKind === "argv", "safe command shortcuts must use argv quoting", shortcut);
assertCondition(shortcut.remoteCommand === "'pwd'", "safe command shortcut should be shell-quoted", shortcut);
const sshLike = parseSshArgs(["echo hello"]);
const hint = sshFailureHint("D601", sshLike, 255, "kex_exchange_identification: Connection closed by remote host");
assertCondition(hint !== null, "ssh-like kex failure must produce a hint", sshLike);
assertCondition(hint?.try === "bun scripts/cli.ts ssh D601 argv bash -lc '<command>'", "hint must provide canonical argv retry", hint);
assertCondition(hint?.triage.includes("provider triage D601"), "hint must provide provider triage command", hint);
const formatted = formatSshFailureHint(hint!);
assertCondition(formatted.startsWith("UNIDESK_SSH_HINT "), "formatted hint must have structured prefix", formatted);
assertCondition(!formatted.includes("echo hello"), "formatted hint must not echo the original remote command", formatted);
const timeoutHint = sshFailureHint("D601", sshLike, 255, "unidesk ssh bridge timed out waiting for provider session");
assertCondition(timeoutHint?.trigger === "timeout-or-kex", "provider session timeout must map to timeout-or-kex", timeoutHint);
const helpText = JSON.stringify(sshHelp());
assertCondition(helpText.includes("ssh D601 argv bash -lc '<command>'"), "ssh help must recommend argv bash -lc for non-interactive commands", helpText);
assertCondition(helpText.includes("UNIDESK_SSH_HINT"), "ssh help must document structured failure hint", helpText);
const crossChecks = providerTriageRecommendedCrossChecks("D601");
assertCondition(crossChecks.includes("bun scripts/cli.ts ssh D601 argv true"), "provider triage cross-checks must keep argv true", crossChecks);
return {
ok: true,
checks: [
"argv form is classified and quoted as the success path for non-interactive commands",
"ssh-like timeout/kex failures emit one structured argv retry hint",
"help text documents argv bash -lc and UNIDESK_SSH_HINT",
"provider triage recommendedCrossChecks keeps ssh D601 argv true",
],
};
}
if (import.meta.main) {
process.stdout.write(`${JSON.stringify(runSshArgvGuidanceContract(), null, 2)}\n`);
}