fix(cli): guide ssh argv usage

2026-05-23 12:32:43 +00:00
parent ca1e2544f0
commit 15d074424c
8 changed files with 171 additions and 23 deletions
@@ -19,7 +19,7 @@ CLI 可以从 `master` 快速演进，但必须兼容 `deploy.json` 固定的 CI
 - `server cleanup plan [--min-age-hours N] [--limit N]` 只生成主 server Docker 镜像清理 dry-run 计划，不执行删除；默认 `--min-age-hours 24`，避免把刚发布或刚验证的镜像列为 stale。输出必须包含 `dryRun=true`、`mutation=false`、`policy.deletionExecuted=false`、active containers/images、受保护镜像、candidate stale images、估算释放空间、风险等级、`commandsToReview` 和人工审批清单。计划必须保守白名单：保留 running containers 使用的 image ID，保留 stopped containers 引用的 image ID 直到人工先复核容器，保留 `deploy.json`/`CI.json` 当前 commit-pinned artifact、Compose stable image、上游 digest pin 和 provider-gateway runner image；`protectedStorage` 必须显式列出 PostgreSQL named volume、Baidu Netdisk `.state`、D601 registry storage 和 Docker volumes/host data policy。该入口禁止生成或执行 `docker system prune`、`docker image prune`、`docker builder prune`、`docker volume rm`、`docker compose down -v`、数据库清理或 host data `rm` 命令；未来若增加真实删除，必须另设显式审批参数并先复核 dry-run 输出。
 - `server rebuild <backend-core|frontend|dev-frontend-proxy|provider-gateway|todo-note|code-queue-mgr|project-manager|baidu-netdisk|oa-event-flow>` 创建异步 job，先构建目标服务镜像，随后在 `.state/locks/server-compose.lock` 串行保护下用 `--no-deps --force-recreate` 替换目标 service 并等待容器 `healthy/running`；该命令用于替代手工删除容器的兜底流程，其中 `dev-frontend-proxy` 只更新主 server dev 入口薄代理，`todo-note`、`code-queue-mgr`、`project-manager`、`baidu-netdisk` 和 `oa-event-flow` 只重建主 server 承载的对应后端，不会重建或删除 database 命名卷。D601 Code Queue 执行面不由 `server rebuild` 管理，Rust backend-core 迭代不得用 `server rebuild backend-core` 在 master server 编译，规则见 `docs/reference/dev-environment.md`。
 - `provider attach <providerId> [--master-server URL] [--up] [--force]` 在新计算节点生成两项配置的 provider-gateway 挂载包：`.state/provider-<ID>.env` 默认只包含 `UNIDESK_MASTER_SERVER` 与 `PROVIDER_ID`，`provider-<ID>.yml` 固定 Docker socket、`pid: "host"`、`restart: always`、只读 `/workspace` 和 SSH 维护私钥挂载；`--up` 会立即执行生成的 `docker compose up -d --build`。`provider triage <providerId> [--observed-error text] [--observed-scope scope] [--microservice id ...] [--full|--raw]` 是只读多信号健康裁决入口，会把单路径 `provider is not online`、SSH 超时、registry 失败和 service proxy 失败归类成 `runner-local-observation-gap`、`service-degraded`、`provider-degraded` 或 `global-blocker`。默认输出只返回裁决、scope、失败/降级/未知信号和有界 evidence 摘要，完整 evidence 必须显式加 `--full` 或 `--raw`；推荐交叉验证命令仍包含 `debug health`、`debug dispatch <providerId> host.ssh --wait-ms 15000`、`ssh <providerId> argv true`、`artifact-registry health --provider-id <providerId>`、`microservice health k3sctl-adapter`、`microservice health code-queue` 和 `codex tasks --view supervisor --limit 20`。
- `ssh <providerId> [ssh-like args...]` 通过 backend-core 内网 WebSocket broker 和 provider-gateway 的 Host SSH / WSL SSH 维护桥连接目标节点；无后续参数时进入远端登录 shell，有后续参数时按 ssh 远端命令体验执行并返回远端 exit code。
+- `ssh <providerId> [ssh-like args...]` 通过 backend-core 内网 WebSocket broker 和 provider-gateway 的 Host SSH / WSL SSH 维护桥连接目标节点；无后续参数时进入远端登录 shell，有后续参数时按 ssh 远端命令体验执行并返回远端 exit code。非交互远端命令优先使用 `ssh <providerId> argv ...`，需要 shell 特性时用 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'`；ssh-like 命令遇到 timeout/kex/255 类失败时，CLI 会在 stderr 追加一行 `UNIDESK_SSH_HINT` JSON，提示 argv 重试和 provider triage 交叉验证。
 - `ssh <providerId> apply-patch [tool args...] < patch.diff` 直接调用远端注入的 `apply_patch` 工具，并把本地 stdin 中的标准 `*** Begin Patch` / `*** End Patch` patch 流透传给目标节点。
 - `ssh <providerId> py [script-args...] < script.py` 把本地 stdin 落到远端临时 `.py` 文件后再以 `python3 -u` 执行并自动清理，避免再手写 `'python3 -'`、heredoc 或多层引号；`script-args` 会按 argv 安全透传给远端脚本。
 - `ssh <providerId> skills [--scope all|wsl|windows] [--limit N]` 发现目标节点上的 WSL/Linux skill 根目录；当 provider 是 WSL 时同一次调用还会扫描 Windows 用户目录下的 `.agents/skills` 与 `.codex/skills`。
@@ -110,12 +110,14 @@ GitHub issue/PR 写操作必须优先使用 `bun scripts/cli.ts gh issue|pr ...

 `bun scripts/cli.ts ssh --help` 和 `bun scripts/cli.ts ssh <providerId> --help` 是本地 JSON 帮助命令，必须快速返回；不能把 `--help` 解析成 Provider ID，不能打开交互 shell，也不能等待 provider 会话。

-`bun scripts/cli.ts ssh D518` 应表现为登录 D518 WSL 的 shell；`bun scripts/cli.ts ssh D518 hostname` 应像 `ssh D518 hostname` 一样只输出远端命令结果并返回远端 exit code。Provider ID 前的目标选择由 UniDesk 节点清单决定，`-p`、`-i`、`-l`、`-o` 等传统 ssh 传输参数由 provider-gateway 部署配置统一管理，CLI 会兼容性消费这些参数但不会覆盖节点侧维护桥配置。
+`bun scripts/cli.ts ssh D518` 应表现为登录 D518 WSL 的 shell；`bun scripts/cli.ts ssh D518 hostname` 应像 `ssh D518 hostname` 一样只输出远端命令结果并返回远端 exit code。Provider ID 前的目标选择由 UniDesk 节点清单决定，`-p`、`-i`、`-l`、`-o` 等传统 ssh 传输参数由 provider-gateway 部署配置统一管理，CLI 会兼容性消费这些参数但不会覆盖节点侧维护桥配置。指挥官、CI 预检和其他非交互流程不要依赖 ssh-like 自由拼接；标准写法是 `bun scripts/cli.ts ssh D601 argv true`，或者在需要管道、变量展开和多条命令时使用 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'`。

 core 只允许声明了 `host.ssh` capability 的 provider 使用 `ssh` 透传或 `host.ssh` dispatch；旧 provider 不支持该能力时必须快速失败并输出错误，不能把未知命令误判成 `echo` 成功。

 本地 broker 默认等待 provider SSH 会话打开 60000ms，以便在目标节点同时有较多 microservice.http 任务时仍能建立维护会话；需要诊断慢连接时可用 `UNIDESK_SSH_OPEN_TIMEOUT_MS=<ms>` 临时调大，但最小有效值固定为 15000ms，避免把真实离线误判为长时间阻塞。

+ssh-like 远端命令如果出现 `kex_exchange_identification`、`Connection closed by remote host`、provider session timeout 或 exit code 255，CLI 会在原始 stderr 后追加一行 `UNIDESK_SSH_HINT { ... }`。该 JSON 不回显原始远端命令，只包含 `code=ssh-like-command-friction`、`trigger`、`try` 和 `triage`；`try` 固定指向 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'` 形态，避免把一次 ssh-like 解析/握手摩擦误读成 D601 SSH 整体不可用。
+
 `ssh <providerId>` 会在远端会话启动时注入 `/tmp/unidesk-ssh-tools/apply_patch`、`/tmp/unidesk-ssh-tools/glob` 和 `/tmp/unidesk-ssh-tools/skill-discover`，并把该目录加入远端 `PATH`。`apply_patch` 接受标准 `*** Begin Patch` / `*** End Patch` patch 格式，便于通过 SSH 透传编辑远端仓库文件；`glob` 在远端用 Python 执行路径匹配，避免依赖 shell glob 展开；`skill-discover` 用于列出远端 Linux/WSL 与 Windows skill。目标节点需要具备 `python3` 和 `base64`。注入工具只写 `/tmp/unidesk-ssh-tools`，不修改目标仓库，交互式 shell 和远端命令都可以直接调用这些工具。

 如果只是远端打小补丁，不需要再手写 `ssh D601 'apply_patch' < patch.diff` 这种命令拼接；正式入口是 `bun scripts/cli.ts ssh D601 apply-patch < patch.diff`。`apply-patch` 与 `patch` 等价，附加参数会原样透传给远端 `apply_patch`，例如 `bun scripts/cli.ts ssh D601 apply-patch --help`。标准单命令用法如下，不需要先创建本地 patch 临时文件：
@@ -160,7 +162,7 @@ bun scripts/cli.ts ssh D601 find /home/ubuntu --max-depth 4 --type d --icontains
 bun scripts/cli.ts ssh D601 glob --root /home/ubuntu/pikapython --pattern '**/*-test.cpp' --limit 20 --sort
 ```

-`ssh <providerId> argv <command> [args...]` 是通用 argv 安全拼接入口；`exec` 是同义入口。它适合不需要 shell 管道的常用命令。`find`、`glob` 和 `apply-patch` 有专用入口；`rg`、`grep`、`sed`、`nl`、`stat`、`du`、`ls`、`cat`、`head`、`tail`、`wc` 和 `pwd` 可以直接作为 `ssh` 子命令使用，CLI 会对每个 argv token 做 shell quoting。需要管道、重定向、变量展开或多条命令时仍使用旧的自由远端命令入口，并把整段远端 shell 脚本作为一个本地参数传入。
+`ssh <providerId> argv <command> [args...]` 是通用 argv 安全拼接入口；`exec` 是同义入口。它是非交互远端命令的默认成功路径，不需要 shell 管道时直接传命令和参数，例如 `bun scripts/cli.ts ssh D601 argv true`；需要管道、重定向、变量展开或多条命令时使用 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'`，让 shell 脚本作为 `bash -lc` 的一个 argv token 传递。`find`、`glob` 和 `apply-patch` 有专用入口；`rg`、`grep`、`sed`、`nl`、`stat`、`du`、`ls`、`cat`、`head`、`tail`、`wc` 和 `pwd` 可以直接作为 `ssh` 子命令使用，CLI 会对每个 argv token 做 shell quoting。旧的自由 ssh-like 远端命令入口只保留为近似原生 ssh 的人工兼容路径。

 通过 `ssh <providerId>` 执行多行脚本时，优先使用结构化 helper，例如 `bun scripts/cli.ts ssh D601 py < script.py` 或 `printf ... | (bun scripts/cli.ts ssh D601 'bash -s')` 这种单层 stdin 传输。不要在远端命令字符串里再嵌套 heredoc、复杂引号或 `ssh 'python3 - <<EOF ...'` 形态；多层 shell 解析容易把 stdin 绑定到错误进程，结果会打开远端交互解释器并留下悬挂的 broker/SSH 会话。长脚本需要复用时，优先通过 stdin 写入目标节点的临时脚本，再在同一个远端命令中显式执行并清理。

@@ -168,7 +170,7 @@ bun scripts/cli.ts ssh D601 glob --root /home/ubuntu/pikapython --pattern '**/*-

 `--main-server-ip` 是一个全局前缀，必须放在需要透传的命令同一次调用中，例如 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug health`。默认传输是公网 frontend：本地 CLI 读取本仓库 `config.json` 中的 frontend 登录账号密码，登录 `http://<ip>:<frontendPort>/` 获取 HttpOnly session cookie，然后通过 frontend 的 `/api/*` 同源代理访问 backend-core 内网 API；因此计算节点只需要能访问公网 frontend，不需要主 server SSH key，也不需要打开 backend-core REST API 或 PostgreSQL 端口。

-默认 frontend 传输支持 `debug health`、`debug dispatch`、`debug task`、`artifact-registry status|health`、`ci publish-user-service --dry-run`、`microservice list/status/health/diagnostics/tunnel-self-test/proxy`、`decision upload/list/show/health`、`decision requirement list/upsert`、`decision diary import/list/history/months/show/edit/upsert`、`codex task <taskId>`、`codex tasks`、`codex unread`、`codex queues`、`codex output <taskId>`、`codex judge <taskId> --attempt N` 和 `ssh <PROVIDER_ID> <remote-command>`。`microservice status/health/diagnostics` 经 frontend 远程传输时也复用本地 CLI 的默认 compact summary，`microservice health code-queue` 只有显式 `--raw` 或 `--full` 才返回完整健康 body。运行中纠偏 `codex steer` 属于 active run write control，应在主 server 本机 CLI 或显式 SSH 传输上执行，避免公网 frontend 透传限制 stdin/body 审计语义。其中 `ssh` 的 remote frontend 传输使用 `host.ssh` dispatch 执行有界远端命令，适合 `ssh D601 hostname` 和 `ssh D601 skills` 这类自测；交互式登录 shell 仍应在主 server 本机 CLI 使用，或显式切换到旧 SSH 传输后在主 server 上执行。frontend 远程透传不会流式转发本地 stdin，因此 `ssh py < script.py`、`ssh apply-patch < patch.diff` 这类 stdin-backed helper 必须在主 server 本机运行，或显式切换到 `--main-server-transport ssh`。当 backend-core、database、provider-dispatch 或 provider-host-ssh 缺失时，这些 read-only 预检必须返回结构化 `runnerDisposition=infra-blocked` 和缺失通道列表，而不是裸 `No such container`。若确实需要旧行为，可使用 `--main-server-key <key>` 或 `--main-server-transport ssh`，这时 CLI 会通过 SSH 登录主 server 的 `--main-server-root` 目录执行同一个 `bun scripts/cli.ts <command>`。
+默认 frontend 传输支持 `debug health`、`debug dispatch`、`debug task`、`artifact-registry status|health`、`ci publish-user-service --dry-run`、`microservice list/status/health/diagnostics/tunnel-self-test/proxy`、`decision upload/list/show/health`、`decision requirement list/upsert`、`decision diary import/list/history/months/show/edit/upsert`、`codex task <taskId>`、`codex tasks`、`codex unread`、`codex queues`、`codex output <taskId>`、`codex judge <taskId> --attempt N` 和 `ssh <PROVIDER_ID> <remote-command>`。`microservice status/health/diagnostics` 经 frontend 远程传输时也复用本地 CLI 的默认 compact summary，`microservice health code-queue` 只有显式 `--raw` 或 `--full` 才返回完整健康 body。运行中纠偏 `codex steer` 属于 active run write control，应在主 server 本机 CLI 或显式 SSH 传输上执行，避免公网 frontend 透传限制 stdin/body 审计语义。其中 `ssh` 的 remote frontend 传输使用 `host.ssh` dispatch 执行有界远端命令，非交互命令同样优先 `ssh D601 argv true` 或 `ssh D601 argv bash -lc '<command>'`；交互式登录 shell 仍应在主 server 本机 CLI 使用，或显式切换到旧 SSH 传输后在主 server 上执行。frontend 远程透传不会流式转发本地 stdin，因此 `ssh py < script.py`、`ssh apply-patch < patch.diff` 这类 stdin-backed helper 必须在主 server 本机运行，或显式切换到 `--main-server-transport ssh`。当 backend-core、database、provider-dispatch 或 provider-host-ssh 缺失时，这些 read-only 预检必须返回结构化 `runnerDisposition=infra-blocked` 和缺失通道列表，而不是裸 `No such container`。若确实需要旧行为，可使用 `--main-server-key <key>` 或 `--main-server-transport ssh`，这时 CLI 会通过 SSH 登录主 server 的 `--main-server-root` 目录执行同一个 `bun scripts/cli.ts <command>`。

 计算节点可以用该入口测试自身的远程升级闭环，而不需要在计算节点公开 core REST API 或 database。标准顺序是：先运行 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug health` 确认主 server 看到当前 Provider 在线，且该 Provider labels 中 `unideskCapabilities` 包含 `host.ssh`、`hostSshConfigured=true`、`hostSshKeyPresent=true`；再运行 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> provider.upgrade --mode schedule --wait-ms 15000` 触发真实 `provider.upgrade`；随后再次运行 `debug health` 确认节点重新上线；最后运行 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000` 和 `bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh <PROVIDER_ID> hostname` 验证 SSH 透传能力。provider-gateway 新部署或升级后没有完成这组 remote CLI 自测，不能视为交付完成。

@@ -142,6 +142,8 @@ backend-core 可以通过真实 WebSocket 调度向在线 provider 下发 `provi

 `bun scripts/cli.ts provider triage <PROVIDER_ID>` 是 provider 运行状态的只读多信号裁决入口。输出必须包含 `decision`、`retryable`、`healthyScopes`、`failedScopes`、`degradedScopes`、`blockingDisposition`、`rationale`、`signals` 和 `recommendedCrossChecks`。`decision` 的长期语义是：`global-offline` 表示 provider heartbeat、Host SSH、k3s 或 scheduler 等多个独立关键面同时失败且没有健康交叉证据；`service-degraded` 表示 registry、service proxy 或单个用户服务局部退化但仍存在 provider 级健康信号；`retryable-transient` 表示单次 runner-local、SSH、proxy 或 API timeout 证据不足，应重试或补交叉验证；`healthy` 表示未观察到失败或退化信号。

+`recommendedCrossChecks` 必须保留 argv 形态的 Host SSH 自检：`bun scripts/cli.ts ssh <PROVIDER_ID> argv true`。这条命令用于证明非交互维护桥仍可用；如果自由 ssh-like 形态出现 timeout、`kex_exchange_identification` 或 `Connection closed by remote host`，应先按 CLI 输出的 `UNIDESK_SSH_HINT` 改用 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'` 复测，再结合 `provider triage` 判断是否真是 provider 级故障。
+
 D601 这类长期 WSL provider 不得因为单一路径失败被直接写成全局离线。典型局部退化包括 artifact registry 的 `unidesk-artifact-registry.service` inactive，但 registry container 仍 running、listener 仍绑定 loopback、`http://127.0.0.1:5000/v2/` 返回 200；这种状态应在 registry scope 内显示 degraded，并在 provider triage 中落到 `decision=service-degraded`，只提示修复 systemd drift，不阻断所有 D601 上的 Code Queue、k3sctl-adapter 或业务 API 判断。

 ## Manual Upgrade Maintenance
@@ -162,6 +164,6 @@ WSL provider 需要调用 Windows-only 工具链时，优先在 WSL 用户的 `~

 维护桥通过真实 WebSocket dispatch 暴露为 `host.ssh` 命令。默认 payload 使用 `mode: "probe"`，远端只执行一个短命令并返回 `UNIDESK_SSH_TEST user=... host=... bridge=host.ssh cwd=...`；需要人工诊断时可以显式使用 `mode: "exec"` 与 `command` 字段执行有界命令。所有 `host.ssh` 执行都必须有超时，stdout/stderr 在 task result 中截断展示；自动升级和普通任务仍必须使用 Docker socket 与 `provider.upgrade`，不得把 WSL SSH 维护桥当成调度通道。

-面向人的终端入口是 `bun scripts/cli.ts ssh <PROVIDER_ID> [ssh-like args...]`。无后续参数时打开远端登录 shell，有后续参数时执行远端命令并返回远端 exit code；该入口走 backend-core 内网 `/ws/ssh` broker 和 provider 既有 WebSocket，不新增公网 core 端口。传统 ssh 传输参数由 provider-gateway 环境变量统一控制，CLI 只负责把 Provider ID 后的远端命令和终端 stdin/stdout/stderr 透传过去。WSL 节点需要同时看清 Linux/WSL 与 Windows 两套 skill 时，使用 `bun scripts/cli.ts ssh <PROVIDER_ID> skills`，该命令只通过已建立的维护桥读取 `SKILL.md` 元数据，不要求 provider-gateway 新增业务 API。
+面向人的终端入口是 `bun scripts/cli.ts ssh <PROVIDER_ID> [ssh-like args...]`。无后续参数时打开远端登录 shell，有后续参数时执行远端命令并返回远端 exit code；该入口走 backend-core 内网 `/ws/ssh` broker 和 provider 既有 WebSocket，不新增公网 core 端口。传统 ssh 传输参数由 provider-gateway 环境变量统一控制，CLI 只负责把 Provider ID 后的远端命令和终端 stdin/stdout/stderr 透传过去。非交互远端命令优先使用 argv 入口：`bun scripts/cli.ts ssh D601 argv true`，或需要 shell 特性时使用 `bun scripts/cli.ts ssh D601 argv bash -lc '<command>'`。WSL 节点需要同时看清 Linux/WSL 与 Windows 两套 skill 时，使用 `bun scripts/cli.ts ssh <PROVIDER_ID> skills`，该命令只通过已建立的维护桥读取 `SKILL.md` 元数据，不要求 provider-gateway 新增业务 API。

-验证 WSL SSH 桥时，先在目标 WSL 中启动 sshd 并确保维护公钥写入目标用户的 `authorized_keys`，再确认目标 provider 注册 labels 中 `unideskCapabilities` 包含 `host.ssh`。运行 `bun scripts/cli.ts debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000` 后，结果应在 `debug task latest` 或前端任务历史中显示 `status: succeeded`、`probeLine` 含 `UNIDESK_SSH_TEST`、`exitCode: 0`，并且目标节点 labels 中 `hostSshKeyPresent` 为 true；随后运行 `bun scripts/cli.ts ssh <PROVIDER_ID> hostname` 验证近似原生 ssh 的远端命令体验。在计算节点本机自测时，使用 remote CLI 透传同一组命令：`bun scripts/cli.ts --main-server-ip 74.48.78.17 debug health`、`bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000` 和 `bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh <PROVIDER_ID> hostname`；默认 remote CLI 走公网 frontend 登录态，不需要主 server SSH key。健康检查必须能看到该 Provider 在线、`hostSshConfigured=true`、`hostSshKeyPresent=true`、`hostSshTarget` 正确、`unideskCapabilities` 包含 `host.ssh`，probe 必须返回 `UNIDESK_SSH_TEST`，`ssh <PROVIDER_ID> hostname` 必须输出目标 WSL/宿主 hostname 且 exit code 为 0。如果 D518 这类 WSL 节点没有公网 SSH 入口，也必须通过这个 provider-gateway 自连维护桥完成验证，而不是要求主 server 直接连节点公网 22 端口；旧版 provider 未声明 `host.ssh` 时必须先升级 provider-gateway，否则 core 会拒绝 SSH 透传。
+验证 WSL SSH 桥时，先在目标 WSL 中启动 sshd 并确保维护公钥写入目标用户的 `authorized_keys`，再确认目标 provider 注册 labels 中 `unideskCapabilities` 包含 `host.ssh`。运行 `bun scripts/cli.ts debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000` 后，结果应在 `debug task latest` 或前端任务历史中显示 `status: succeeded`、`probeLine` 含 `UNIDESK_SSH_TEST`、`exitCode: 0`，并且目标节点 labels 中 `hostSshKeyPresent` 为 true；随后运行 `bun scripts/cli.ts ssh <PROVIDER_ID> argv true` 验证非交互 argv 维护命令，再运行 `bun scripts/cli.ts ssh <PROVIDER_ID> hostname` 验证近似原生 ssh 的远端命令体验。在计算节点本机自测时，使用 remote CLI 透传同一组命令：`bun scripts/cli.ts --main-server-ip 74.48.78.17 debug health`、`bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000`、`bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh <PROVIDER_ID> argv true` 和 `bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh <PROVIDER_ID> hostname`；默认 remote CLI 走公网 frontend 登录态，不需要主 server SSH key。健康检查必须能看到该 Provider 在线、`hostSshConfigured=true`、`hostSshKeyPresent=true`、`hostSshTarget` 正确、`unideskCapabilities` 包含 `host.ssh`，probe 必须返回 `UNIDESK_SSH_TEST`，`ssh <PROVIDER_ID> argv true` 与 `ssh <PROVIDER_ID> hostname` 必须 exit code 为 0。如果 D518 这类 WSL 节点没有公网 SSH 入口，也必须通过这个 provider-gateway 自连维护桥完成验证，而不是要求主 server 直接连节点公网 22 端口；旧版 provider 未声明 `host.ssh` 时必须先升级 provider-gateway，否则 core 会拒绝 SSH 透传。
@@ -1,4 +1,4 @@
-import { buildProviderTriageResult, type ProviderTriageSignal } from "./src/provider-triage";
+import { buildProviderTriageResult, providerTriageRecommendedCrossChecks, type ProviderTriageSignal } from "./src/provider-triage";
 import { codexTaskQuery } from "./src/code-queue";
 import { classifyRunnerError } from "../src/components/microservices/code-queue/src/runner-error-classifier";

@@ -58,6 +58,11 @@ export function runProviderRunnerTriageContract(): JsonRecord {
  assertCondition(result.decision === "retryable-transient", "single path provider offline should be retryable transient", result);
  assertCondition(result.retryable === true, "single path provider offline should be retryable", result);
  assertCondition(result.contract.singlePathProviderOfflineIsGlobalBlocker === false, "triage contract should reject single-path global blocker", result);
+  assertCondition(result.recommendedCrossChecks.includes("bun scripts/cli.ts ssh D601 argv true"), "provider triage result must recommend argv Host SSH cross-check", result.recommendedCrossChecks);
+
+  const crossChecks = providerTriageRecommendedCrossChecks("D601");
+  assertCondition(crossChecks.includes("bun scripts/cli.ts ssh D601 argv true"), "provider triage recommendedCrossChecks must keep ssh argv true", crossChecks);
+  assertCondition(crossChecks.includes("bun scripts/cli.ts debug dispatch D601 host.ssh --wait-ms 15000"), "provider triage recommendedCrossChecks must keep host.ssh dispatch probe", crossChecks);

  const rateLimitTriage = buildProviderTriageResult("D601", [
    signal("observed-runner-429", "external-provider", "failed", false),
@@ -102,6 +107,7 @@ export function runProviderRunnerTriageContract(): JsonRecord {
      "runner error classifier separates runner-local/provider-gateway/registry/k3s/scheduler/unknown",
      "each single runner error classification has globalBlocker=false",
      "provider triage keeps single provider is not online as retryable-transient, not global-blocker",
+      "provider triage recommendedCrossChecks keeps host.ssh dispatch and ssh argv true probes",
      "external OpenAI/model provider 429 is explicit retryable backoff evidence, not Code Queue infra outage",
      "codex task --detail preserves runnerErrorClassification in compact attempt output",
    ],
@@ -45,6 +45,7 @@ const syntaxFiles = [
  "scripts/microservice-health-output-contract-test.ts",
  "scripts/code-queue-supervisor-disclosure-contract-test.ts",
  "scripts/code-queue-commander-view-contract-test.ts",
+  "scripts/ssh-argv-guidance-contract-test.ts",
  "src/components/frontend/src/index.ts",
  "src/components/frontend/src/app.tsx",
  "src/components/frontend/src/decision-center.tsx",
@@ -334,6 +335,7 @@ export function runChecks(config: UniDeskConfig, options: CheckOptions = default
      fileItem("scripts/host-codex-commander-skeleton-contract-test.ts"),
      fileItem("scripts/host-codex-commander-no-daemon-smoke-contract-test.ts"),
      fileItem("scripts/provider-runner-triage-contract-test.ts"),
+      fileItem("scripts/ssh-argv-guidance-contract-test.ts"),
      fileItem("scripts/src/provider-triage.ts"),
      fileItem("src/components/microservices/code-queue/src/runner-error-classifier.ts"),
      fileItem("scripts/src/ci.ts"),
@@ -384,6 +386,7 @@ export function runChecks(config: UniDeskConfig, options: CheckOptions = default
    items.push(commandItem("host-codex-commander:skeleton-contract", ["bun", "scripts/host-codex-commander-skeleton-contract-test.ts"], 30_000));
    items.push(commandItem("host-codex-commander:no-daemon-smoke-contract", ["bun", "scripts/host-codex-commander-no-daemon-smoke-contract-test.ts"], 30_000));
    items.push(commandItem("provider:runner-triage-contract", ["bun", "scripts/provider-runner-triage-contract-test.ts"], 30_000));
+    items.push(commandItem("ssh:argv-guidance-contract", ["bun", "scripts/ssh-argv-guidance-contract-test.ts"], 30_000));
    items.push(commandItem("deploy:artifact-matrix-contract", ["bun", "scripts/deploy-artifact-matrix-contract-test.ts"], 30_000));
    items.push(commandItem("decision-center:desired-state-contract", ["bun", "scripts/decision-center-desired-state-contract-test.ts"], 30_000));
    items.push(commandItem("code-queue:active-run-heartbeat-visible", ["bun", "scripts/code-queue-liveness-diagnostics-test.ts", "--only", "code-queue:active-run-heartbeat-visible"], 30_000));
@@ -423,6 +426,7 @@ export function runChecks(config: UniDeskConfig, options: CheckOptions = default
    items.push(skippedItem("host-codex-commander:skeleton-contract", "host Codex commander skeleton contract is opt-in with script checks", "--scripts-typecheck or --full"));
    items.push(skippedItem("host-codex-commander:no-daemon-smoke-contract", "host Codex commander no-daemon smoke contract is opt-in with script checks", "--scripts-typecheck or --full"));
    items.push(skippedItem("provider:runner-triage-contract", "Provider runner triage contract is opt-in with script checks", "--scripts-typecheck or --full"));
+    items.push(skippedItem("ssh:argv-guidance-contract", "SSH argv guidance and failure hint contract is opt-in with script checks", "--scripts-typecheck or --full"));
    items.push(skippedItem("deploy:artifact-matrix-contract", "deploy artifact matrix contract is opt-in with script checks", "--scripts-typecheck or --full"));
    items.push(skippedItem("decision-center:desired-state-contract", "Decision Center desired-state drift contract is opt-in with script checks", "--scripts-typecheck or --full"));
    items.push(skippedItem("code-queue:liveness-diagnostics-fixtures", "Code Queue liveness diagnostics fixtures are opt-in with script checks", "--scripts-typecheck or --full"));
@@ -18,13 +18,13 @@ export function rootHelp(): unknown {
      { command: "server cleanup plan [--min-age-hours N] [--limit N]", description: "Dry-run Docker image cleanup plan only: list active/protected images, stale candidates older than the default 24h threshold, risk, estimated reclaim, and manual review commands without deleting anything." },
      { command: "server rebuild <backend-core|frontend|dev-frontend-proxy|provider-gateway|todo-note|code-queue-mgr|project-manager|baidu-netdisk|oa-event-flow>", description: "Maintenance-only local Compose rebuild for reviewed main-server services; frontend standard release must use CI artifact plus deploy apply dev/prod artifact consumers." },
      { command: "provider attach <providerId> [--master-server URL] [--up] [--force] | provider triage <providerId> [--observed-error text] [--observed-scope scope] [--microservice id ...] [--full|--raw]", description: "Generate the minimal external provider-gateway env/compose bundle or run the low-noise read-only provider health triage contract." },
-      { command: "ssh <providerId> [ssh-like args...]", description: "Open a Host SSH / WSL SSH maintenance session through the provider-gateway bridge with built-in remote helper tools in PATH." },
+      { command: "ssh <providerId> [ssh-like args...]", description: "Open a Host SSH / WSL SSH maintenance session through the provider-gateway bridge; prefer `ssh <providerId> argv ...` for non-interactive remote commands." },
      { command: "ssh <providerId> apply-patch [tool args...] < patch.diff", description: "Invoke the injected remote apply_patch helper directly over SSH passthrough and stream the patch from local stdin." },
      { command: "ssh <providerId> py [script-args...] < script.py", description: "Run remote Python from local stdin through SSH passthrough without nested shell quoting; extra args become script argv." },
      { command: "ssh <providerId> skills [--scope all|wsl|windows] [--limit N]", description: "Discover WSL/Linux and, for WSL providers, Windows skill directories in one SSH passthrough call." },
      { command: "ssh <providerId> find <path...> [--max-depth N] [--type d|f|l] [--contains TEXT] [--iname PATTERN] [--limit N] [--sort]", description: "Run a structured remote find command without nested shell quoting or parentheses." },
      { command: "ssh <providerId> glob [--root DIR] [--pattern PATTERN] [--contains TEXT] [--type any|f|d] [--limit N] [--sort]", description: "Run remote glob matching through the injected helper without shell glob expansion." },
-      { command: "ssh <providerId> argv <command> [args...]", description: "Run a remote command with each argv token shell-quoted by UniDesk before SSH passthrough." },
+      { command: "ssh <providerId> argv <command> [args...]", description: "Run a non-interactive remote command with each argv token shell-quoted by UniDesk before SSH passthrough; use `argv bash -lc '<command>'` when shell features are required." },
      { command: "microservice list", description: "List UniDesk-managed user services and their provider/runtime mapping." },
      { command: "microservice status <id>", description: "Show one user service config, repository reference, backend mapping, and runtime status." },
      { command: "microservice health <id> [--compact|--raw|--full]", description: "Probe one user service through backend-core -> provider-gateway HTTP proxy; default output is compact, and code-queue uses a commander-safe liveness summary unless raw/full is requested." },
@@ -142,6 +142,7 @@ export function sshHelp(): unknown {
    usage: [
      "bun scripts/cli.ts ssh <providerId>",
      "bun scripts/cli.ts ssh <providerId> argv <command> [args...]",
+      "bun scripts/cli.ts ssh D601 argv bash -lc '<command>'",
      "bun scripts/cli.ts ssh <providerId> apply-patch < patch.diff",
      "bun scripts/cli.ts ssh <providerId> py [script-args...] < script.py",
      "bun scripts/cli.ts ssh <providerId> skills [--scope all|wsl|windows] [--limit N]",
@@ -150,7 +151,8 @@ export function sshHelp(): unknown {
    ],
    notes: [
      "ssh --help and ssh <providerId> --help print this JSON help and never open an interactive session.",
-      "Use argv when nested shell quoting would be fragile.",
+      "For non-interactive remote commands, prefer argv: bun scripts/cli.ts ssh D601 argv bash -lc '<command>'.",
+      "If an ssh-like remote command fails with timeout/kex/exit-255 friction, stderr includes one low-noise UNIDESK_SSH_HINT JSON line with the argv retry command.",
      "Use -- before a remote command that intentionally starts with a dash.",
    ],
  };
@@ -3,7 +3,7 @@ import { type UniDeskConfig } from "./config";
 import { type DebugDispatchCommand, isDebugDispatchCommand } from "./debug";
 import { summarizeMicroserviceHealthResponse, summarizeMicroserviceObservation, summarizeMicroserviceProxyResponse } from "./microservices";
 import { parseNetworkPerfOptions, runNetworkPerf } from "./network-perf";
-import { isSshSkillDiscoveryArgs, parseSshArgs } from "./ssh";
+import { formatSshFailureHint, isSshSkillDiscoveryArgs, parseSshArgs, sshFailureHint } from "./ssh";
 import { codexJudgeQueryAsync, codexOutputQueryAsync, codexPrPreflightQueryAsync, codexQueuesQueryAsync, codexTaskQueryAsync, codexTasksQueryAsync, codexUnreadTriageAsync } from "./code-queue";
 import { runDecisionCenterCommandAsync } from "./decision-center";
 import {
@@ -862,7 +862,10 @@ async function runRemoteSshOverFrontend(session: FrontendSession, providerId: st
  if (stderr.length > 0) process.stderr.write(stderr);
  if (task?.status !== "succeeded") {
    if (stdout.length === 0 && stderr.length === 0) process.stderr.write(`${JSON.stringify({ taskId, task }, null, 2)}\n`);
-    return typeof result.exitCode === "number" ? result.exitCode : 255;
+    const exitCode = typeof result.exitCode === "number" ? result.exitCode : 255;
+    const hint = sshFailureHint(providerId, parsed, exitCode, stderr.length > 0 ? stderr : String(task?.message ?? ""));
+    if (hint !== null) process.stderr.write(formatSshFailureHint(hint));
+    return exitCode;
  }
  return typeof result.exitCode === "number" ? result.exitCode : 0;
 }
@@ -4,6 +4,20 @@ import { type UniDeskConfig, repoRoot } from "./config";
 export interface ParsedSshArgs {
  remoteCommand: string | null;
  requiresStdin: boolean;
+  invocationKind: SshInvocationKind;
+}
+
+export type SshInvocationKind = "interactive" | "argv" | "helper" | "ssh-like";
+
+export interface SshFailureHint {
+  code: "ssh-like-command-friction";
+  providerId: string;
+  trigger: "timeout-or-kex" | "exit-255";
+  exitCode: number;
+  message: string;
+  try: string;
+  triage: string;
+  note: string;
 }

 const argvQuotedSshSubcommands = new Set(["rg", "grep", "sed", "nl", "stat", "du", "ls", "cat", "head", "tail", "wc", "pwd"]);
@@ -479,28 +493,28 @@ export function parseSshArgs(args: string[]): ParsedSshArgs {
  const subcommand = args[0] ?? "";
  if (isSshSkillDiscoveryArgs(args)) {
    const toolArgs = subcommand === "skill" ? ["skill-discover", ...args.slice(2)] : ["skill-discover", ...args.slice(1)];
-    return { remoteCommand: shellArgv(toolArgs), requiresStdin: false };
+    return { remoteCommand: shellArgv(toolArgs), requiresStdin: false, invocationKind: "helper" };
  }
  if (subcommand === "apply-patch" || subcommand === "patch") {
    const toolArgs = ["apply_patch", ...args.slice(1)];
-    return { remoteCommand: shellArgv(toolArgs), requiresStdin: true };
+    return { remoteCommand: shellArgv(toolArgs), requiresStdin: true, invocationKind: "helper" };
  }
  if (subcommand === "py") {
-    return { remoteCommand: buildPythonStdinCommand(args.slice(1)), requiresStdin: true };
+    return { remoteCommand: buildPythonStdinCommand(args.slice(1)), requiresStdin: true, invocationKind: "helper" };
  }
  if (subcommand === "argv" || subcommand === "exec") {
    const toolArgs = args.slice(1);
    if (toolArgs.length === 0) throw new Error(`ssh ${subcommand} requires a command`);
-    return { remoteCommand: shellArgv(toolArgs), requiresStdin: false };
+    return { remoteCommand: shellArgv(toolArgs), requiresStdin: false, invocationKind: "argv" };
  }
  if (subcommand === "find") {
-    return { remoteCommand: buildFindCommand(args.slice(1)), requiresStdin: false };
+    return { remoteCommand: buildFindCommand(args.slice(1)), requiresStdin: false, invocationKind: "helper" };
  }
  if (subcommand === "glob") {
-    return { remoteCommand: shellArgv(["glob", ...args.slice(1)]), requiresStdin: false };
+    return { remoteCommand: shellArgv(["glob", ...args.slice(1)]), requiresStdin: false, invocationKind: "helper" };
  }
  if (argvQuotedSshSubcommands.has(subcommand)) {
-    return { remoteCommand: shellArgv(args), requiresStdin: false };
+    return { remoteCommand: shellArgv(args), requiresStdin: false, invocationKind: "argv" };
  }
  const remote: string[] = [];
  let remoteStarted = false;
@@ -521,7 +535,11 @@ export function parseSshArgs(args: string[]): ParsedSshArgs {
    remoteStarted = true;
    remote.push(arg);
  }
-  return { remoteCommand: remote.length === 0 ? null : remote.join(" "), requiresStdin: false };
+  return {
+    remoteCommand: remote.length === 0 ? null : remote.join(" "),
+    requiresStdin: false,
+    invocationKind: remote.length === 0 ? "interactive" : "ssh-like",
+  };
 }

 function shellArgv(args: string[]): string {
@@ -658,6 +676,48 @@ export function wrapSshRemoteCommand(command: string | null): string {
  return `${bootstrap}; stty -echo 2>/dev/null || true; ${command}`;
 }

+function safeProviderId(providerId: string): string {
+  return /^[A-Za-z0-9_.-]{1,64}$/u.test(providerId) ? providerId : "<provider>";
+}
+
+function classifySshLikeFailure(exitCode: number, stderrText: string): SshFailureHint["trigger"] | null {
+  const normalized = stderrText.toLowerCase();
+  if (
+    normalized.includes("kex_exchange_identification")
+    || normalized.includes("ssh_exchange_identification")
+    || normalized.includes("connection closed by remote host")
+    || normalized.includes("connection reset by peer")
+    || normalized.includes("connection timed out")
+    || normalized.includes("operation timed out")
+    || normalized.includes("timed out waiting for provider session")
+    || normalized.includes("the operation was aborted")
+  ) {
+    return "timeout-or-kex";
+  }
+  return exitCode === 255 ? "exit-255" : null;
+}
+
+export function sshFailureHint(providerId: string, parsed: ParsedSshArgs, exitCode: number, stderrText: string): SshFailureHint | null {
+  if (parsed.invocationKind !== "ssh-like") return null;
+  const trigger = classifySshLikeFailure(exitCode, stderrText);
+  if (trigger === null) return null;
+  const shownProviderId = safeProviderId(providerId);
+  return {
+    code: "ssh-like-command-friction",
+    providerId: shownProviderId,
+    trigger,
+    exitCode,
+    message: "ssh-like remote command failed before proving Host SSH is globally unavailable; prefer argv form for non-interactive commands.",
+    try: `bun scripts/cli.ts ssh ${shownProviderId} argv bash -lc '<command>'`,
+    triage: `bun scripts/cli.ts provider triage ${shownProviderId} --observed-scope ssh --observed-error '<ssh-like timeout or kex failure>'`,
+    note: "This hint intentionally does not echo the original remote command.",
+  };
+}
+
+export function formatSshFailureHint(hint: SshFailureHint): string {
+  return `UNIDESK_SSH_HINT ${JSON.stringify(hint)}\n`;
+}
+
 function brokerSource(): string {
  return String.raw`
 const open = JSON.parse(process.argv[2] || process.argv[1] || "{}");
@@ -831,8 +891,16 @@ export async function runSsh(config: UniDeskConfig, providerId: string, args: st
  if (rawMode) process.stdin.setRawMode(true);
  process.stdin.resume();
  process.stdin.pipe(child.stdin);
+  let stderrTail = "";
+  const appendStderrTail = (chunk: Buffer | string): void => {
+    const text = Buffer.isBuffer(chunk) ? chunk.toString("utf8") : chunk;
+    stderrTail = (stderrTail + text).slice(-16_384);
+  };
  child.stdout.pipe(process.stdout);
-  child.stderr.pipe(process.stderr);
+  child.stderr.on("data", (chunk: Buffer) => {
+    appendStderrTail(chunk);
+    process.stderr.write(chunk);
+  });

  return await new Promise<number>((resolve) => {
    const restore = (): void => {
@@ -841,12 +909,19 @@ export async function runSsh(config: UniDeskConfig, providerId: string, args: st
    };
    child.on("error", (error) => {
      restore();
-      process.stderr.write(`unidesk ssh failed to start broker: ${error.message}\n`);
+      const message = `unidesk ssh failed to start broker: ${error.message}\n`;
+      appendStderrTail(message);
+      process.stderr.write(message);
+      const hint = sshFailureHint(providerId, parsed, 255, stderrTail);
+      if (hint !== null) process.stderr.write(formatSshFailureHint(hint));
      resolve(255);
    });
    child.on("close", (code) => {
      restore();
-      resolve(code ?? 255);
+      const exitCode = code ?? 255;
+      const hint = sshFailureHint(providerId, parsed, exitCode, stderrTail);
+      if (hint !== null) process.stderr.write(formatSshFailureHint(hint));
+      resolve(exitCode);
    });
  });
 }
@@ -0,0 +1,54 @@
+import { sshHelp } from "./src/help";
+import { providerTriageRecommendedCrossChecks } from "./src/provider-triage";
+import { formatSshFailureHint, parseSshArgs, sshFailureHint } from "./src/ssh";
+
+type JsonRecord = Record<string, unknown>;
+
+function assertCondition(condition: unknown, message: string, detail: unknown = {}): void {
+  if (!condition) throw new Error(`${message}: ${JSON.stringify(detail)}`);
+}
+
+export function runSshArgvGuidanceContract(): JsonRecord {
+  const argv = parseSshArgs(["argv", "bash", "-lc", "echo ok"]);
+  assertCondition(argv.invocationKind === "argv", "argv subcommand must be classified as argv", argv);
+  assertCondition(argv.remoteCommand === "'bash' '-lc' 'echo ok'", "argv command must shell-quote each token", argv);
+  assertCondition(argv.requiresStdin === false, "argv command must not require stdin", argv);
+  assertCondition(sshFailureHint("D601", argv, 255, "kex_exchange_identification: Connection closed by remote host") === null, "argv failures must not produce ssh-like friction hint", argv);
+
+  const shortcut = parseSshArgs(["pwd"]);
+  assertCondition(shortcut.invocationKind === "argv", "safe command shortcuts must use argv quoting", shortcut);
+  assertCondition(shortcut.remoteCommand === "'pwd'", "safe command shortcut should be shell-quoted", shortcut);
+
+  const sshLike = parseSshArgs(["echo hello"]);
+  const hint = sshFailureHint("D601", sshLike, 255, "kex_exchange_identification: Connection closed by remote host");
+  assertCondition(hint !== null, "ssh-like kex failure must produce a hint", sshLike);
+  assertCondition(hint?.try === "bun scripts/cli.ts ssh D601 argv bash -lc '<command>'", "hint must provide canonical argv retry", hint);
+  assertCondition(hint?.triage.includes("provider triage D601"), "hint must provide provider triage command", hint);
+  const formatted = formatSshFailureHint(hint!);
+  assertCondition(formatted.startsWith("UNIDESK_SSH_HINT "), "formatted hint must have structured prefix", formatted);
+  assertCondition(!formatted.includes("echo hello"), "formatted hint must not echo the original remote command", formatted);
+
+  const timeoutHint = sshFailureHint("D601", sshLike, 255, "unidesk ssh bridge timed out waiting for provider session");
+  assertCondition(timeoutHint?.trigger === "timeout-or-kex", "provider session timeout must map to timeout-or-kex", timeoutHint);
+
+  const helpText = JSON.stringify(sshHelp());
+  assertCondition(helpText.includes("ssh D601 argv bash -lc '<command>'"), "ssh help must recommend argv bash -lc for non-interactive commands", helpText);
+  assertCondition(helpText.includes("UNIDESK_SSH_HINT"), "ssh help must document structured failure hint", helpText);
+
+  const crossChecks = providerTriageRecommendedCrossChecks("D601");
+  assertCondition(crossChecks.includes("bun scripts/cli.ts ssh D601 argv true"), "provider triage cross-checks must keep argv true", crossChecks);
+
+  return {
+    ok: true,
+    checks: [
+      "argv form is classified and quoted as the success path for non-interactive commands",
+      "ssh-like timeout/kex failures emit one structured argv retry hint",
+      "help text documents argv bash -lc and UNIDESK_SSH_HINT",
+      "provider triage recommendedCrossChecks keeps ssh D601 argv true",
+    ],
+  };
+}
+
+if (import.meta.main) {
+  process.stdout.write(`${JSON.stringify(runSshArgvGuidanceContract(), null, 2)}\n`);
+}