docs: converge trans shell examples

2026-06-15 05:25:59 +00:00
parent e504b7b3b4
commit ceb3fb4627
15 changed files with 65 additions and 65 deletions
@@ -118,7 +118,7 @@ dry-run 输出会暴露 registry probe URL、required labels、目标 image、
 `status` 和 `health` 通过：

 ```bash
-trans D601 argv bash -lc '<readonly script>'
+trans D601 sh -- '<readonly shell command>'
 ```

 只读检查 D601 状态。检查项包括：
@@ -62,14 +62,14 @@ If a manual repair is needed to unblock the platform, the durable fix must be co

 Distributed runtime work should prefer structured CLI passthrough over ad-hoc nested shell strings. The standard escalation order is:

-1. Use a purpose-built UniDesk route plus operation or helper such as `trans D601:k3s kubectl ...`, `trans D601:k3s script`, `trans D601:k3s:<namespace>:<workload> logs`, `trans D601:k3s:<namespace>:<workload> script`, `trans D601:k3s:<namespace>:<workload>[:<container>] apply-patch --cwd /workspace`, `trans <providerId>:/absolute/workspace apply-patch`, `trans <providerId> py`, `trans <providerId> find`, `trans <providerId> glob` or `trans <providerId> skills`. Use legacy `apply-patch-v1` only when the old remote helper is explicitly required.
+1. Use a purpose-built UniDesk route plus operation or helper such as `trans D601:k3s kubectl ...`, `trans D601:k3s sh`, `trans D601:k3s:<namespace>:<workload> logs`, `trans D601:k3s:<namespace>:<workload> sh`, `trans D601:k3s:<namespace>:<workload>[:<container>] apply-patch --cwd /workspace`, `trans <providerId>:/absolute/workspace apply-patch`, `trans <providerId> py`, `trans <providerId> find`, `trans <providerId> glob` or `trans <providerId> skills`. Use legacy `apply-patch-v1` only when the old remote helper is explicitly required.
 2. If no helper exists, use `trans <providerId> argv <command> [args...]` so the CLI quotes each argv token once.
-3. If shell features such as pipes, redirects, loops or variable expansion are required, use a single quoted heredoc with `trans <providerId> script` or `trans D601:k3s:<namespace>:<workload> script` so the script body travels over stdin instead of through shell command-string arguments.
+3. If shell features such as pipes, redirects, loops or variable expansion are required, use a single quoted heredoc with explicit `trans <providerId> sh|bash` or `trans D601:k3s:<namespace>:<workload> sh|bash` so the script body travels over stdin instead of through shell command-string arguments.
 4. Treat free-form ssh-like command strings as an interactive compatibility path, not as the default automation surface.

-For D601 Kubernetes work, route syntax is preferred over positional shell recipes, but the route must stay a pure locator. `D601:k3s` means the native k3s control plane, and `D601:k3s:<namespace>:<workload>[:container]` means a namespaced workload or pod/container. `:` is the distributed route separator; `/` is only an in-container filesystem cwd, so container selection must use `:<container>` or `--container <container>`, not `pod/<pod>/<container>`. Operations come after the route: `kubectl` runs on the control plane, `logs` reads bounded workload logs, `script` streams a local heredoc/stdin script into the host or target pod, and `apply-patch --cwd /workspace` is the default remote text patch operation for pod workspaces. The route-operation split keeps distributed location and execution behavior independently extensible, fixes `KUBECONFIG=/etc/rancher/k3s/k3s.yaml`, refuses long-follow logs, and assembles common `kubectl exec` / `kubectl logs` / stdin script / pod patch target arguments without adding a provider-gateway protocol change. This prevents the common failure mode where a command crosses local shell, UniDesk SSH broker, remote shell command strings, `kubectl exec`, and container shell quoting layers before reaching the process that should run it.
+For D601 Kubernetes work, route syntax is preferred over positional shell recipes, but the route must stay a pure locator. `D601:k3s` means the native k3s control plane, and `D601:k3s:<namespace>:<workload>[:container]` means a namespaced workload or pod/container. `:` is the distributed route separator; `/` is only an in-container filesystem cwd, so container selection must use `:<container>` or `--container <container>`, not `pod/<pod>/<container>`. Operations come after the route: `kubectl` runs on the control plane, `logs` reads bounded workload logs, `sh`/`bash` stream a local heredoc/stdin script into the host or target pod with an explicit shell dialect, and `apply-patch --cwd /workspace` is the default remote text patch operation for pod workspaces. The route-operation split keeps distributed location and execution behavior independently extensible, fixes `KUBECONFIG=/etc/rancher/k3s/k3s.yaml`, refuses long-follow logs, and assembles common `kubectl exec` / `kubectl logs` / stdin shell / pod patch target arguments without adding a provider-gateway protocol change. This prevents the common failure mode where a command crosses local shell, UniDesk SSH broker, remote shell command strings, `kubectl exec`, and container shell quoting layers before reaching the process that should run it.

-Longer scripts should move across stdin (`trans py`, `trans script` or k3s `script` operation), and remote text patches should default to `apply-patch` with a host or pod workspace route. Legacy `apply-patch-v1` remains available as the explicit fallback and uses the injected `sh` helper path instead of assuming target containers have `python3`, `node` or repository-local tools. Avoid heredocs nested inside remote command strings, `python - <<EOF` inside SSH strings, or JSON/Markdown bodies passed through shell arguments. These patterns often bind stdin to the wrong process, strip quotes, or leave a half-open provider SSH session that looks like a platform outage.
+Longer scripts should move across stdin (`trans py`, explicit `trans sh|bash`, or k3s `sh|bash` operation), and remote text patches should default to `apply-patch` with a host or pod workspace route. Legacy `apply-patch-v1` remains available as the explicit fallback and uses the injected `sh` helper path instead of assuming target containers have `python3`, `node` or repository-local tools. Avoid heredocs nested inside remote command strings, `python - <<EOF` inside SSH strings, or JSON/Markdown bodies passed through shell arguments. These patterns often bind stdin to the wrong process, strip quotes, or leave a half-open provider SSH session that looks like a platform outage.

 When structured passthrough is missing for a recurring workflow, fix the CLI first and then document the durable helper. Do not preserve a growing collection of one-off shell recipes as the long-term runbook.

@@ -15,8 +15,8 @@ G14 platform DB 是 G14 host OS 上的原生 PostgreSQL，不是 k3s workload，
 G14 平台库固定由 systemd 管理：

 ```bash
-trans G14 script -- 'systemctl status postgresql'
-trans G14 script -- '/usr/local/sbin/g14-platform-db-health'
+trans G14 sh -- 'systemctl status postgresql'
+trans G14 sh -- '/usr/local/sbin/g14-platform-db-health'
 ```

 PostgreSQL 只监听 G14 host loopback 与 k3s pod 可达的 node gateway 地址：
@@ -76,8 +76,8 @@ HWLAB v0.3 的 source truth 在 `G14:/root/hwlab-v03`、branch `v0.3`。`deploy/
 标准验证：

 ```bash
-trans G14:/root/hwlab-v03 script -- 'npm run gitops:ts:check'
-trans G14:/root/hwlab-v03 script -- 'npm run gitops:render -- --lane v03 --out /tmp/hwlab-v03-render-check'
+trans G14:/root/hwlab-v03 sh -- 'npm run gitops:ts:check'
+trans G14:/root/hwlab-v03 sh -- 'npm run gitops:render -- --lane v03 --out /tmp/hwlab-v03-render-check'
 bun scripts/cli.ts hwlab nodes control-plane trigger-current --node G14 --lane v03 --confirm
 bun scripts/cli.ts hwlab nodes control-plane status --node G14 --lane v03 --pipeline-run <pipeline-run>
 ```
@@ -132,8 +132,8 @@ bun scripts/cli.ts hwlab nodes secret cleanup-obsolete --node G14 --lane v03 --n
 备份脚本固定在 G14 host：

 ```bash
-trans G14 script -- 'systemctl status g14-platform-db-backup.timer'
-trans G14 script -- '/usr/local/sbin/g14-platform-db-backup'
+trans G14 sh -- 'systemctl status g14-platform-db-backup.timer'
+trans G14 sh -- '/usr/local/sbin/g14-platform-db-backup'
 ```

 备份目录：
@@ -25,7 +25,7 @@ The old `/root/hwlab` workspace on branch `G14` is no longer a default source tr
 The standard entry forms are:

 ```bash
-trans G14:/root/hwlab script -- 'git fetch origin G14 && git pull --ff-only origin G14 && git status --short --branch && git remote -v'
+trans G14:/root/hwlab sh -- 'git fetch origin G14 && git pull --ff-only origin G14 && git status --short --branch && git remote -v'
 trans G14:/root/hwlab apply-patch < patch.diff
 trans G14:k3s kubectl get pods -n hwlab-v02
 ```
@@ -55,7 +55,7 @@ HWLAB `v0.2` is the supported G14 runtime lane for the v0.2 branch. It must not
 The fixed `v0.2` source branch is `v0.2`, forked from the current `G14` branch after the G14 long-term reference docs record this decision. The fixed G14 development workspace for that branch is:

 ```bash
-trans G14:/root/hwlab-v02 script -- 'git status --short --branch && git remote -v'
+trans G14:/root/hwlab-v02 sh -- 'git status --short --branch && git remote -v'
 ```

 `/root/hwlab-v02` is the long-lived `v0.2` development workspace, not a scratch clone or CI/CD source selector. It must track `origin/v0.2` with `origin git@github.com:pikasTech/HWLAB.git`; local dirty state, stale `HEAD`, and untracked `.worktree/` only affect human development. Do not reuse retired `/root/hwlab` or `/root/hwlab/.worktree/*` as the `v0.2` fixed workspace.
@@ -122,13 +122,13 @@ The generic P2/P3/P4 flow is owned by `$dad-dev`; this section fixes the G14/v0.
 Direct-lightweight precheck:

 ```bash
-trans G14:/root/hwlab-v02 script -- 'git fetch origin v0.2 && git pull --ff-only origin v0.2 && git status --short --branch'
+trans G14:/root/hwlab-v02 sh -- 'git fetch origin v0.2 && git pull --ff-only origin v0.2 && git status --short --branch'
 ```

 Service workflow setup:

 ```bash
-trans G14:/root/hwlab-v02 script -- 'git worktree add .worktree/<task> -b fix/issue<N>-<short-name> origin/v0.2'
+trans G14:/root/hwlab-v02 sh -- 'git worktree add .worktree/<task> -b fix/issue<N>-<short-name> origin/v0.2'
 ```

 The fixed repo at `/root/hwlab-v02` is not a scratch area for service/runtime work, but it is the direct-lightweight source workspace. When a direct-lightweight task sees parallel dirty state in the fixed repo, inspect and include or separate it according to the current user instruction and project Git rules; never discard it silently. Worktree branches for service workflow should follow the `fix/issue<N>-<short-name>` naming so PR titles and merge commits stay scannable. GitHub PR writes, merge, rollout trigger and final original-entry validation follow `$dad-dev` plus the UniDesk CLI control rules in `AGENTS.md`.
@@ -137,17 +137,17 @@ The fixed repo at `/root/hwlab-v02` is not a scratch area for service/runtime wo

 Direct-lightweight commits are allowed and do not need recovery. A direct commit on `v0.2` only needs recovery when it changed service/runtime/GitOps/CI/CD/public behavior that should have used the PR/rollout workflow. The recovery is bounded and audit-friendly, but it is also a `git push --force-with-lease` against the protected branch, so it is only acceptable when the unapproved direct commit is the only new content on `v0.2` since the last merged PR:

-1. Confirm no parallel worktree was in flight and the commit is the only delta. `trans G14:/root/hwlab-v02 script -- 'git log origin/v0.2..HEAD'` and `git log HEAD..origin/v0.2` must show the direct commit as a single fast-forward candidate.
+1. Confirm no parallel worktree was in flight and the commit is the only delta. `trans G14:/root/hwlab-v02 sh -- 'git log origin/v0.2..HEAD'` and `git log HEAD..origin/v0.2` must show the direct commit as a single fast-forward candidate.
 2. Capture the commit identity and patch for the recovery record:
   ```bash
-   trans G14:/root/hwlab-v02 script -- 'git show <direct-commit-sha> > /tmp/v0.2-recovery.patch'
+   trans G14:/root/hwlab-v02 sh -- 'git show <direct-commit-sha> > /tmp/v0.2-recovery.patch'
   ```
 3. Roll the fixed repo back to the previous merged PR head. Use `git reset --hard <previous-pr-sha>`; this preserves any autostash (e.g. from a parallel `git checkout` snapshot in another worktree) on the stash list and does not touch the other worktree's working tree.
-4. In the pre-existing worktree (e.g. `.worktree/<task>` on `fix/issue<N>-<short-name>`) bring the branch up to the previous PR head with `trans G14:/root/hwlab-v02/.worktree/<task> script -- 'git reset --hard <previous-pr-sha>'`, then `git cherry-pick <direct-commit-sha>` to replay the direct commit on the feature branch. If the worktree branch was already a clean clone of `origin/v0.2` at the previous PR head, the reset is a no-op.
+4. In the pre-existing worktree (e.g. `.worktree/<task>` on `fix/issue<N>-<short-name>`) bring the branch up to the previous PR head with `trans G14:/root/hwlab-v02/.worktree/<task> sh -- 'git reset --hard <previous-pr-sha>'`, then `git cherry-pick <direct-commit-sha>` to replay the direct commit on the feature branch. If the worktree branch was already a clean clone of `origin/v0.2` at the previous PR head, the reset is a no-op.
 5. Push the feature branch and force-push `v0.2` back to the rolled-back head with `--force-with-lease` (refuses to clobber a concurrent push):
   ```bash
-   trans G14:/root/hwlab-v02/.worktree/<task> script -- 'git push -u origin fix/issue<N>-<short-name>'
-   trans G14:/root/hwlab-v02 script -- 'git push --force-with-lease origin v0.2'
+   trans G14:/root/hwlab-v02/.worktree/<task> sh -- 'git push -u origin fix/issue<N>-<short-name>'
+   trans G14:/root/hwlab-v02 sh -- 'git push --force-with-lease origin v0.2'
   ```
 6. Open the PR through UniDesk CLI, squash-merge, then `git pull --ff-only origin v0.2` to bring the fixed repo back in sync. The previous PR's merge commit will not be in the new PR's history; the new PR's diff equals the original direct commit's diff, so the PR trail still contains the exact same bytes.
 7. `bun scripts/cli.ts hwlab g14 control-plane status --lane v02` will read the new merge commit; the previously-staged PipelineRun for the direct commit was created on the v0.2 head and `trigger-current` will delete + recreate it for the post-merge head, so no manual PipelineRun cleanup is required.
@@ -160,7 +160,7 @@ Cloud Web layout, status-panel, collapsed-control, and modal issues on `v0.2` ne

 Use these surfaces together:

- `trans G14:/root/hwlab-v02/.worktree/<task>/web/hwlab-cloud-web script -- 'bun run check'` for approved static source/layout checks and dist freshness.
+- `trans G14:/root/hwlab-v02/.worktree/<task>/web/hwlab-cloud-web sh -- 'bun run check'` for approved static source/layout checks and dist freshness.
 - `bun scripts/cli.ts hwlab g14 control-plane status --lane v02` for runtime, Argo, public endpoint, and GitOps alignment. If `origin/v0.2` moved through a parallel PR, use `--pipeline-run` or `--source-commit` and treat same-branch supersession as context rather than failure.
 - Public API probes for both `/health/live` and `/v1/live-builds`. `/health/live` proves live service health/revision, but Cloud Web build time, image tag/digest, source metadata, and actual runtime commit/revision should be read from `/v1/live-builds`.
 - A bounded browser/DOM probe against `http://74.48.78.17:19666/` that asserts the deployed page state relevant to the issue.
@@ -183,9 +183,9 @@ When a `v0.2` Cloud Web fix removes a button from `index.html` or a field from t

 ```bash
 # 1. Web assets rebuild and the orphan is gone from the dist
-trans G14:/root/hwlab-v02/.worktree/<task> script -- 'cd web/hwlab-cloud-web && bun run build'
-trans G14:/root/hwlab-v02/.worktree/<task> script -- "grep -c '<removed-field>' web/hwlab-cloud-web/dist/app.js"   # must be 0
-trans G14:/root/hwlab-v02/.worktree/<task> script -- "grep -c 'id=\"<removed-id>\"' web/hwlab-cloud-web/index.html" # must be 0
+trans G14:/root/hwlab-v02/.worktree/<task> sh -- 'cd web/hwlab-cloud-web && bun run build'
+trans G14:/root/hwlab-v02/.worktree/<task> sh -- "grep -c '<removed-field>' web/hwlab-cloud-web/dist/app.js"   # must be 0
+trans G14:/root/hwlab-v02/.worktree/<task> sh -- "grep -c 'id=\"<removed-id>\"' web/hwlab-cloud-web/index.html" # must be 0

 # 2. Live 19666/19667 confirms the deployed bundle is the new build
 curl -fsS http://74.48.78.17:19666/ | grep -c '<removed-id>'                                          # must be 0
@@ -196,7 +196,7 @@ bun scripts/cli.ts hwlab g14 control-plane status --lane v02
 While the PR is open, the author can also run a one-liner to surface any orphan `el.<field>.addEventListener` whose field is not declared in the `el` literal of `app.ts`:

 ```bash
-trans G14:/root/hwlab-v02/.worktree/<task> script -- 'awk "/^const el = /,/^};/" web/hwlab-cloud-web/app.ts | tr -d "," | awk "{print \$1}" | grep -E "^[a-zA-Z]" | sort -u > /tmp/el-fields.txt; grep -nEo "el\\.([A-Za-z_$][A-Za-z0-9_$]*)\\.addEventListener" web/hwlab-cloud-web/*.ts | while read m; do field=$(echo "$m" | sed -E "s/.*el\\.([A-Za-z_$][A-Za-z0-9_$]*)\\.addEventListener.*/\\1/"); if ! grep -q "^$field$" /tmp/el-fields.txt; then echo "ORPHAN: el.$field.addEventListener"; fi; done'
+trans G14:/root/hwlab-v02/.worktree/<task> sh -- 'awk "/^const el = /,/^};/" web/hwlab-cloud-web/app.ts | tr -d "," | awk "{print \$1}" | grep -E "^[a-zA-Z]" | sort -u > /tmp/el-fields.txt; grep -nEo "el\\.([A-Za-z_$][A-Za-z0-9_$]*)\\.addEventListener" web/hwlab-cloud-web/*.ts | while read m; do field=$(echo "$m" | sed -E "s/.*el\\.([A-Za-z_$][A-Za-z0-9_$]*)\\.addEventListener.*/\\1/"); if ! grep -q "^$field$" /tmp/el-fields.txt; then echo "ORPHAN: el.$field.addEventListener"; fi; done'
 ```

 Document the explicit `grep` / curl evidence in the issue closeout comment. Tightening the `el` literal with proper TypeScript types is tracked separately and must not be done as part of a runtime fix PR.
@@ -276,10 +276,10 @@ A live `workspace.evidence` / `debug.evidence` / `download evidence` selector th
 Device-pod fixes still follow `$dad-dev` and the service/runtime side of the `## v0.2 Source Workflow` route above. The device-pod-specific closeout is the three-layer runtime matrix below; keep these checks because they prove the cloud-api -> executor -> D601 host chain, while generic PR/CI/CD and worktree mechanics stay in `$dad-dev`.

 ```bash
-trans G14:/root/hwlab-v02/.worktree/<task> script -- 'cd tools && bun test device-pod-cli.test.ts'
-trans G14:/root/hwlab-v02/.worktree/<task> script -- 'cd cmd/hwlab-device-pod && bun test main.test.ts'
-trans G14:/root/hwlab-v02/.worktree/<task> script -- 'cd internal/cloud && bun test access-control.test.ts'
-trans G14:/root/hwlab-v02/.worktree/<task> script -- 'node --check skills/device-pod-cli/assets/device-host-cli.mjs'
+trans G14:/root/hwlab-v02/.worktree/<task> sh -- 'cd tools && bun test device-pod-cli.test.ts'
+trans G14:/root/hwlab-v02/.worktree/<task> sh -- 'cd cmd/hwlab-device-pod && bun test main.test.ts'
+trans G14:/root/hwlab-v02/.worktree/<task> sh -- 'cd internal/cloud && bun test access-control.test.ts'
+trans G14:/root/hwlab-v02/.worktree/<task> sh -- 'node --check skills/device-pod-cli/assets/device-host-cli.mjs'
 ```

 Treat `access-control.test.ts` workbench failures as pre-existing on the v0.2 base unless the new test list explicitly covers them. After PR merge and `trigger-current --lane v02 --confirm`, the live `http://74.48.78.17:19667/` CLI 验收 must hit all three layers:
@@ -214,11 +214,11 @@ Registry 报告必须区分 `uniqueBlobBytes` 和 `sharedBlobBytes`。多个 rep
 G14 空间审计默认只读。需要报告时优先采集以下摘要，避免全量 dump 大 JSON：

 ```bash
-trans G14 script -- 'df -h / | tail -1'
-trans G14 script -- 'du -xh -d 1 / /var /var/lib /root 2>/dev/null | sort -h | tail -40'
-trans G14 script -- 'du -xh -d 2 /var/lib/rancher/k3s /var/lib/containerd /var/log 2>/dev/null | sort -h | tail -80'
-trans G14 script -- 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get pv,pvc,pod -A -o wide'
-trans G14 script -- 'find /var/lib/hwlab/registry/docker/registry/v2/repositories -path "*/_manifests/tags/*/current/link" -type f | wc -l'
+trans G14 sh -- 'df -h / | tail -1'
+trans G14 sh -- 'du -xh -d 1 / /var /var/lib /root 2>/dev/null | sort -h | tail -40'
+trans G14 sh -- 'du -xh -d 2 /var/lib/rancher/k3s /var/lib/containerd /var/log 2>/dev/null | sort -h | tail -80'
+trans G14:k3s kubectl get pv,pvc,pod -A -o wide
+trans G14 sh -- 'find /var/lib/hwlab/registry/docker/registry/v2/repositories -path "*/_manifests/tags/*/current/link" -type f | wc -l'
 ```

 需要深挖 registry 时，报告字段至少包括 repo、tag count、manifest revision count、latest tags、protected digest closure、unique blob bytes 和 shared blob bytes。需要深挖 k3s runtime 时，报告字段至少包括 namespace/PVC、PV host path、owner workload、PVC 实占、k3s containerd snapshots/blobs 总量。不要把 `/var/lib/kubelet/pods` 与 `/var/lib/rancher/k3s/storage` 简单相加，因为 kubelet pod 目录可能包含 PVC bind mount 或 runtime 元数据，存在重复计数风险。
@@ -226,8 +226,8 @@ trans G14 script -- 'find /var/lib/hwlab/registry/docker/registry/v2/repositorie
 需要深挖日志和 worktree 时，默认只读报告，不直接清理：

 ```bash
-trans G14 script -- 'du -xh -d 1 /var/log 2>/dev/null | sort -h | tail -40'
-trans G14 script -- 'du -xh -d 2 /root/hwlab-v02/.worktree 2>/dev/null | sort -h | tail -60'
+trans G14 sh -- 'du -xh -d 1 /var/log 2>/dev/null | sort -h | tail -40'
+trans G14 sh -- 'du -xh -d 2 /root/hwlab-v02/.worktree 2>/dev/null | sort -h | tail -60'
 ```

 rsyslog 文件日志不属于当前 `gc remote` 默认可变更对象。若 `/var/log/syslog*`、`/var/log/kern.log*` 或同类文件成为 50% 目标的最后缺口，应先新增受控 logrotate/压缩/截断 CLI，并在输出中披露保留 tail、压缩对象、释放估算和失败恢复；禁止直接 `truncate` 或删除日志文件作为长期流程。`/root/hwlab-v02/.worktree` 只能在明确 owner、branch、dirty 状态和可重建性后清理，不能按目录大小直接删除。
@@ -237,10 +237,10 @@ rsyslog 文件日志不属于当前 `gc remote` 默认可变更对象。若 `/va
 G14 GC 后必须验证：

 ```bash
-trans G14 script -- 'df -h / | tail -1'
-trans G14 script -- 'curl -fsS http://127.0.0.1:5000/v2/ >/dev/null && echo ok'
-trans G14 script -- 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n hwlab-ci get deploy hwlab-registry'
-trans G14 script -- 'KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n hwlab-ci get cronjob hwlab-g14-branch-poller -o custom-columns=NAME:.metadata.name,SUSPEND:.spec.suspend --no-headers && ! KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl -n hwlab-ci get cronjob hwlab-v02-branch-poller >/dev/null 2>&1'
+trans G14 sh -- 'df -h / | tail -1'
+trans G14 sh -- 'curl -fsS http://127.0.0.1:5000/v2/ >/dev/null && echo ok'
+trans G14:k3s kubectl -n hwlab-ci get deploy hwlab-registry
+trans G14:k3s sh -- 'kubectl -n hwlab-ci get cronjob hwlab-g14-branch-poller -o custom-columns=NAME:.metadata.name,SUSPEND:.spec.suspend --no-headers && ! kubectl -n hwlab-ci get cronjob hwlab-v02-branch-poller >/dev/null 2>&1'
 ```

 DEV workload 验证应检查非零副本 workload 是否 ready；`0/0` 的显式停用 deployment 不应误报为事故。registry tag 数只作为辅证，不能替代 workload ref 保护和 registry API 健康。
@@ -86,7 +86,7 @@ Sanitizer rules: recursively scans `ResponsesRequest.input`, repairs tool-call `

 ## MiniMax Apply-Patch Operations

-MiniMax-backed sessions must use the same UniDesk remote text patch contract as other agents: route first, operation second, and `apply-patch` v2 by default. The stable write shape is `trans <provider>:/absolute/workspace apply-patch < patch.diff`; read-only inspection may use `trans <provider>:/absolute/workspace script -- 'nl -ba file'` or equivalent bounded commands.
+MiniMax-backed sessions must use the same UniDesk remote text patch contract as other agents: route first, operation second, and `apply-patch` v2 by default. The stable write shape is `trans <provider>:/absolute/workspace apply-patch < patch.diff`; read-only inspection may use `trans <provider>:/absolute/workspace nl -ba file` or equivalent bounded commands.

 - If `apply-patch` reports `failed to find expected lines`, first read the exact current target block, then retry with a smaller `Update File` hunk, an `@@ <unique anchor>` hint, or multiple small hunks. This is normal stale-context recovery, not a reason to switch tools.
 - Do not recover text patch failures by using `download` / `upload`, remote Python/Perl/sed heredocs, `cat >` / `tee` whole-file rewrites, or `apply-patch-v1`, unless `apply-patch` itself is unavailable or the target is non-text / bulk mechanical generated content.
@@ -42,7 +42,7 @@ UniDesk 用户服务是挂载到 UniDesk 核心服务上的、面向用户使用

 业务仓库由业务系统自己维护，包括源码、Dockerfile、docker-compose、配置模板和业务测试。UniDesk 只引用业务仓库 URL、commit id、Dockerfile/docker-compose 路径和运行容器名；不得把业务全量代码复制到 `src/components/microservices/` 形成双维护。`src/components/microservices/` 只能放通用示例或 UniDesk 自有示例，不作为业务仓库镜像。

-Code Queue runner 也是分布式开发执行面。runner 镜像必须内置 `tran`，让 runner 在执行任务时能通过公网 frontend 控制面访问 D601、G14、host workspace、k3s 控制面和目标 pod。runner 内应优先使用 `tran <provider> argv ...`、`tran <provider>:k3s kubectl ...`、`tran <provider>:k3s:<namespace>:<workload> argv ...` 这类结构化命令；需要 stdin 的 `script`、`apply-patch`、`py` 操作同样通过 frontend `/ws/ssh` 流式通道执行，不应退回 `/api/dispatch` task polling。这个边界避免把 provider token、backend-core 内网 DNS 或长命令多层引号作为 runner 可用性的前提，也避免大 stdout 被 task JSON compact 截断。
+Code Queue runner 也是分布式开发执行面。runner 镜像必须内置 `tran`，让 runner 在执行任务时能通过公网 frontend 控制面访问 D601、G14、host workspace、k3s 控制面和目标 pod。runner 内应优先使用 `tran <provider> argv ...`、`tran <provider>:k3s kubectl ...`、`tran <provider>:k3s:<namespace>:<workload> argv ...` 这类结构化命令；需要 stdin 的 `sh`/`bash`、`apply-patch`、`py` 操作同样通过 frontend `/ws/ssh` 流式通道执行，不应退回 `/api/dispatch` task polling。这个边界避免把 provider token、backend-core 内网 DNS 或长命令多层引号作为 runner 可用性的前提，也避免大 stdout 被 task JSON compact 截断。

 ## Main Server User Services

@@ -227,7 +227,7 @@ D601 上必须显式使用原生 k3s kubeconfig：`KUBECONFIG=/etc/rancher/k3s/k
 - Skill 注入边界：DEV Code Queue scheduler/read/write Pod 必须把宿主 `/home/ubuntu/.agents/skills` 只读挂载到容器 `/root/.agents/skills`，并设置 `UNIDESK_SKILLS_PATH=/root/.agents/skills`，让执行任务能读取 `cli-spec` 等技能；只允许挂载 skill 目录本身，不得把宿主 `~/.agents`、`~/.codex`、token、auth JSON 或其他隐私配置整体暴露给任务容器。`/health` 和 `/api/dev-ready` 必须暴露非敏感 `skills` 状态：路径、exists、available、readonly、skillCount、`cliSpecAvailable` 和修复建议；CLI `codex dev-ready` 可读取该摘要。当前交付只要求 DEV manifest 和旧 direct Compose 诊断路径具备只读 skill 注入；PROD Code Queue 发布前必须单独审查隔离级别，不能把 DEV 桥接模式直接推广为生产默认。
 - Develop-ready 镜像：Code Queue 镜像必须在启动前预装 UniDesk/Pipeline 调试所需工具，至少包含 `codex`、`bun`、`node`、`npm`/`npx`、`git`、`rg`、`curl`、`python3`/`pip3`、`docker`、`docker compose`、`docker-compose`、`jq`、`ssh`、`rsync`、`make`、`gcc`/`g++`、`iptables`、`tar`、`gzip` 和 `unzip`；不得依赖 Codex 任务运行时再 `apt-get install` 这些基础环境。
 - 远程开发容器与任务执行 Provider：Code Queue 必须能通过 live API 拉起 D601 等计算节点上的开发容器，入口为 `POST /api/dev-containers/<providerId>/start`，默认 Provider 为 `D601`。该流程由 Code Queue 调用 UniDesk SSH 维护桥在目标节点创建 `unidesk-codex-dev-<providerId>`；人工入口写 `trans <providerId>`，内部服务调用仍复用同一 route parser 和 broker。在 Code Queue 所在节点与开发容器之间建立 `ssh -w` TUN 点对点链路；服务所在节点负责对开发容器的 TUN 源地址做 NAT/MASQUERADE，开发容器默认路由和 DNS 改走该 TUN，从而让 `ping google.com`、DNS、HTTP(S) 等出网都经主 server 全局代理，而不是依赖 D601 本地网络。提交 Code Queue 任务时必须支持选择执行 Provider：`D601` 在 D601 原生 k3s 的 active Code Queue scheduler/runner Pod 中本机执行，默认工作目录为 `/workspace`，并且 `/workspace` 必须映射 D601 WSL host 的 `/home/ubuntu`；同一个 hostPath 还必须挂载到容器内 `/home/ubuntu`，让 WSL home 里的绝对 symlink（例如 `/workspace/cq-deploy -> /home/ubuntu/unidesk-code-queue-deploy`）在任务中可解析，不能只看到 symlink 名而无法进入目标目录。`/root/unidesk` 与 `/app` 必须单独映射 `/home/ubuntu/cq-deploy` 作为服务部署仓库；其他 Provider 在对应 `unidesk-codex-dev-<providerId>` 容器中执行，默认工作目录为 `/home/ubuntu`，可按任务覆盖 `cwd`。远程任务启动前必须自动复用或拉起该 Provider 的开发容器、同步 Codex 配置和允许的运行时 provider 环境变量，并通过同一 master TUN/NAT 链路出网；目标 host 存在 `/mnt` 时，开发容器必须挂载 host `/mnt:/mnt`，确保 D601 这类 WSL 节点的 Windows 盘符路径如 `/mnt/f/Work/ConStart` 在任务容器内可见，避免 agent 因缺少真实工作区而搜索到无关项目。TUN 建立必须幂等处理 stale 状态：启动前清理旧 `tun<id>`、默认路由、旧 tunnel SSH 进程和旧 OUTPUT 跳转，缺失旧设备不能导致失败，冷启动运行时准备要有有界但足够的 timeout。TUN 建立后必须创建 `UD-CQ-EGRESS-<provider>` OUTPUT 链，规则只允许 loopback、既有连接、`tun<id>` 出口以及到 master server 的 SSH tunnel 控制连接，随后 reject 其他 IPv4/IPv6 出站包；这条网络层封口是开发/执行容器的权威外网边界，不能用 `HTTP_PROXY`/`NO_PROXY` 环境变量替代，容器镜像也必须使用已解析出的唯一 `unidesk-code-queue:<provider>` 或显式 `image`，缺失时直接失败，禁止 provider-gateway image、`latest` 或其他隐式镜像 fallback。验收必须保留三类日志：容器建隧道后 `ping google.com` 成功、强制指定原 Docker 网卡直连外网被 `sealed_direct_ping=blocked_expected` 拦截、服务所在节点上对应 `UNIDESK-CODEX-DEV-<providerId>` NAT 链或 `tun<id>` 计数在 ping 前后增长；涉及 WSL 工作区任务时还必须在开发容器内验证目标 `/mnt/...` 路径可读。`GET /api/dev-containers/<providerId>/status` 必须展示默认路由、`route_8_8_8_8`、`egressFirewallChain` 和 OUTPUT 链跳转。开发容器代理密钥只生成到 `.state/code-queue/dev-proxy/` 与目标节点用户目录，不得提交到仓库。
- 远程维护桥调用：Code Queue 已迁移到 D601 后，Code Queue 后端 Pod 内没有主 server 的 `unidesk-backend-core` 容器，不能再把 `trans ...` 实现为本地 `docker exec unidesk-backend-core`。Code Queue runner 发起的 provider 维护命令必须通过主 server frontend authenticated `/ws/ssh` 流式代理进入 backend-core SSH bridge，再由目标 provider-gateway 执行 Host SSH/WSL SSH；stdout/stderr 直接流回 runner，不能经过 `/api/dispatch` task polling 或 JSON compact。需要传递脚本、`py` 或 `apply-patch` 时也使用同一条 stdin 流式通道，避免恢复到本地 Docker broker、手工 base64 分块上传、交互 shell fallback 或多层引号。
+- 远程维护桥调用：Code Queue 已迁移到 D601 后，Code Queue 后端 Pod 内没有主 server 的 `unidesk-backend-core` 容器，不能再把 `trans ...` 实现为本地 `docker exec unidesk-backend-core`。Code Queue runner 发起的 provider 维护命令必须通过主 server frontend authenticated `/ws/ssh` 流式代理进入 backend-core SSH bridge，再由目标 provider-gateway 执行 Host SSH/WSL SSH；stdout/stderr 直接流回 runner，不能经过 `/api/dispatch` task polling 或 JSON compact。需要传递 `sh`/`bash` stdin shell body、`py` 或 `apply-patch` 时也使用同一条 stdin 流式通道，避免恢复到本地 Docker broker、手工 base64 分块上传、交互 shell fallback 或多层引号。
 - 远程 Provider 准备不得阻塞控制面：Code Queue 在请求处理、队列调度、远程开发容器准备、Host SSH/WSL SSH 透传、Codex/OpenCode 启动和日志导出路径中，禁止使用会长时间占用 Bun event loop 的同步子进程调用，例如针对远程 Provider 的 `spawnSync`、`execSync` 或 `execFileSync`。远程命令必须通过异步子进程执行，带显式 timeout、超时 kill、stdout/stderr 上限和任务 output 进度记录；远程准备失败只能让对应任务进入失败或 retry，不能让 `POST /api/tasks`、SSE `/api/events`、`/health`、overview 或 frontend/core 用户服务代理等控制面请求等待远程 SSH 结束。凡是改动 D601/远程 Provider 准备、`api/dev-containers/*`、任务入队启动或 `runCodeQueueSsh` 等路径，验收必须在一个远程 SSH/status/start 探针运行期间并发验证容器直连 `/health` 和 `/api/tasks/overview` 仍能在 1s 内返回，证明远程超时不会复发为全站刷新卡死。
 - OpenCode 远程执行：`minimax-m3` 与 `minimax-m2.7` 两路并行配置走 OpenCode JSON event port 时，本地和远程命令都必须显式执行 `opencode run ...`；远程 Docker exec 不得退化成 `exec run ...`，否则会在目标容器内变成 `bash: exec: run: not found`。OpenCode JSON stream 的终态判定以“当前进程退出码 + 当前 attempt 的最终 assistant response”为准：`exit=0` 且当前 attempt 产生非空最终回复时，即使上游没有发 `step_finish` 事件，也应视为正常 terminal；非零退出、无当前最终回复或传输关闭才进入 retry。每个 attempt 的 `finalResponse` 必须只来自当前 OpenCode/Codex turn，禁止在当前 turn 未产出最终回复时回退复用 task 上一次 `finalResponse`，否则会把旧任务内容误判为本轮完成。
 - Codex 控制：服务内部启动 `codex app-server --listen stdio://`，用 JSON-RPC 调用 `thread/start`、`turn/start`、`turn/steer` 和 `turn/interrupt`，并监听 `turn/completed`、assistant delta、reasoning delta、command output delta、file diff delta 等通知生成前端可轮询的 transcript。
@@ -8,18 +8,18 @@ Use UniDesk SSH passthrough for PK01 host operations:

 ```bash
 trans PK01 argv hostname
-trans PK01 script <<'SCRIPT'
+trans PK01 sh <<'SH'
 df -h /
 docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Status}}'
-SCRIPT
+SH
 ```

 Before closing an operation, verify both the provider channel and host workload state:

 ```bash
 bun scripts/cli.ts debug health
-trans PK01 argv bash -lc 'docker inspect --format "name={{.Name}} restart={{.HostConfig.RestartPolicy.Name}} pid={{.HostConfig.PidMode}} state={{.State.Status}} image={{.Config.Image}}" unidesk-provider-gateway-pk01'
-trans PK01 argv bash -lc 'docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"'
+trans PK01 sh -- 'docker inspect --format "name={{.Name}} restart={{.HostConfig.RestartPolicy.Name}} pid={{.HostConfig.PidMode}} state={{.State.Status}} image={{.Config.Image}}" unidesk-provider-gateway-pk01'
+trans PK01 sh -- 'docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"'
 ```

 PK01 has no k3s control plane. `trans PK01:k3s ...` is not an operating truth. If a future PK01 k3s lane is introduced, it must get a separate runtime-lane reference and must not reuse the current pikanode host-data policy as a Kubernetes retention policy.
@@ -130,9 +130,9 @@ PK01 has node-local retention controls installed so that pikanode temp output an
 Operational checks:

 ```bash
-trans PK01 argv bash -lc 'systemctl status unidesk-pk01-pikanode-temp-gc.timer --no-pager'
-trans PK01 argv bash -lc 'sudo systemctl start unidesk-pk01-pikanode-temp-gc.service && tail -n 40 /var/log/unidesk-pk01/pikanode-temp-gc.log'
-trans PK01 argv bash -lc 'sudo logrotate -d /etc/logrotate.d/unidesk-pk01-pikanode'
+trans PK01 sh -- 'systemctl status unidesk-pk01-pikanode-temp-gc.timer --no-pager'
+trans PK01 sh -- 'sudo systemctl start unidesk-pk01-pikanode-temp-gc.service && tail -n 40 /var/log/unidesk-pk01/pikanode-temp-gc.log'
+trans PK01 sh -- 'sudo logrotate -d /etc/logrotate.d/unidesk-pk01-pikanode'
 ```

 The timer and logrotate configuration are node-local operational state. If a future UniDesk CLI subcommand manages PK01 retention centrally, it must first render a dry-run plan, show the same protected paths, and then install/update these node-local files through a confirmed operation.
@@ -142,10 +142,10 @@ The timer and logrotate configuration are node-local operational state. If a fut
 PK01 space attribution should use short, bounded commands. Recommended probes:

 ```bash
-trans PK01 argv bash -lc 'df -h / && df -i /'
-trans PK01 argv bash -lc 'sudo timeout 20 du -xhd1 /var /home/ubuntu/pikanode /home/ubuntu/.vscode-server /var/lib/docker /var/log 2>/dev/null | sort -h | tail -80'
-trans PK01 argv bash -lc 'docker system df -v | sed -n "1,220p"'
-trans PK01 argv bash -lc 'sudo find /home/ubuntu/pikanode/html/temp -xdev -mindepth 1 -maxdepth 1 -printf "%TY-%Tm-%Td %TH:%TM %p\n" | sort | tail -40'
+trans PK01 sh -- 'df -h / && df -i /'
+trans PK01 sh -- 'sudo timeout 20 du -xhd1 /var /home/ubuntu/pikanode /home/ubuntu/.vscode-server /var/lib/docker /var/log 2>/dev/null | sort -h | tail -80'
+trans PK01 sh -- 'docker system df -v | sed -n "1,220p"'
+trans PK01 sh -- 'sudo find /home/ubuntu/pikanode/html/temp -xdev -mindepth 1 -maxdepth 1 -printf "%TY-%Tm-%Td %TH:%TM %p\n" | sort | tail -40'
 ```

 Interpretation guide:
@@ -167,7 +167,7 @@ When target-level `egressProxy.enabled=true`, the D601 target renders an in-clus

 Adding, removing, exposing, validating, and configuring local Codex consumers are daily operations covered by `$unidesk-sub2api`. The development rule is that ordinary pool membership changes stay YAML-only and do not add code or CI/CD. Code changes are only appropriate when UniDesk needs to render or validate a Sub2API capability that already exists upstream, such as account-level WebSocket mode or per-account upstream User-Agent. If Sub2API itself does not support a desired behavior, do not magic-patch it through UniDesk scripts, Kubernetes hotfixes, local forks, or hidden compatibility paths; either leave the behavior unsupported or pursue it upstream as an explicit Sub2API feature.

-`codex-pool sync --confirm` and `codex-pool validate` are runtime operations that may need more than one SSH short-connection window because they log in to Sub2API, reconcile accounts, inspect recent logs, and run gateway smoke requests. The formal entry remains the UniDesk CLI, which must use a submit-and-short-poll control shape or an equivalent remote job wrapper instead of one long `trans G14:k3s script` call. If these commands fail with `UNIDESK_SSH_RUNTIME_TIMEOUT` while the remote operation may still be running, treat it as a control-plane visibility gap first: improve or use the CLI's job/poll path, then rerun `sync` or `validate`. Do not replace it with raw `kubectl`, manual Sub2API admin API patches, repeated blind full loops, or Sub2API source modifications.
+`codex-pool sync --confirm` and `codex-pool validate` are runtime operations that may need more than one SSH short-connection window because they log in to Sub2API, reconcile accounts, inspect recent logs, and run gateway smoke requests. The formal entry remains the UniDesk CLI, which must use a submit-and-short-poll control shape or an equivalent remote job wrapper instead of one long `trans G14:k3s sh` call. If these commands fail with `UNIDESK_SSH_RUNTIME_TIMEOUT` while the remote operation may still be running, treat it as a control-plane visibility gap first: improve or use the CLI's job/poll path, then rerun `sync` or `validate`. Do not replace it with raw `kubectl`, manual Sub2API admin API patches, repeated blind full loops, or Sub2API source modifications.

 After `codex-pool configure-local --confirm`, the default `~/.codex/config.toml` / `auth.json` pair must remain the unified Sub2API consumer and must not be reused as an upstream account profile. Keep every upstream source profile in suffixed files such as `config.toml.<profile>` / `auth.json.<profile>` and register it through YAML `profiles.entries`.

@@ -46,7 +46,7 @@ Provider WebSocket 是注册、heartbeat、dispatch、`provider.upgrade` 和短

 TCP pool 的长期方向是把 `trans`/`tran` 变成真正并发的短连接工具，而不是给单条 Provider WebSocket 继续叠队列。backend-core 只用 provider WebSocket 下发 open/dispatch/exit 等控制帧，stdin/stdout/stderr 数据帧必须走已预热的 TCP channel；每个 SSH 会话、脚本、文件传输或 Windows 透传命令独占一条 channel，结束后释放回池。池耗尽、channel 丢失、data port 不可达和 provider 版本过旧都必须是结构化快速失败；禁止把请求排进应用层队列后长时间不返回。

-当前默认池大小是 10 条，设计上优先覆盖高频短 SSH、并发小文件和单个大文件不阻塞其他请求的场景。已验证的目标状态是：D601 这类 WSL provider 上 10 路并发 `trans ... argv bash -lc 'sleep 2'` 不再出现 `provider ssh tcp data pool has no idle channel`、stderr 为空、每一路 stdout 都包含命令开始和结束输出，结束后 labels 回到 `ready=desired`、`claimed=0`。当前仍存在端到端固定开销，10 路并发短命令的墙钟可能明显高于远端命令自身耗时；这属于后续连接建立、broker 调度、WSL SSH spawn 或 provider 启动路径的性能优化范围，不能用队列、门禁或隐藏重试掩盖。
+当前默认池大小是 10 条，设计上优先覆盖高频短 SSH、并发小文件和单个大文件不阻塞其他请求的场景。已验证的目标状态是：D601 这类 WSL provider 上 10 路并发 `trans ... sh -- 'sleep 2'` 不再出现 `provider ssh tcp data pool has no idle channel`、stderr 为空、每一路 stdout 都包含命令开始和结束输出，结束后 labels 回到 `ready=desired`、`claimed=0`。当前仍存在端到端固定开销，10 路并发短命令的墙钟可能明显高于远端命令自身耗时；这属于后续连接建立、broker 调度、WSL SSH spawn 或 provider 启动路径的性能优化范围，不能用队列、门禁或隐藏重试掩盖。

 开发中最容易踩的坑是把“依赖层在线”误判成“数据面可用”。`host.ssh` 只证明 provider 能执行维护 SSH；`host.ssh.tcp-pool`、`providerGatewaySshDataPoolReady`、`providerGatewaySshDataPoolClaimed` 和 `providerGatewaySshDataPoolLastError` 才能证明 TCP 数据池状态。另一个坑是输出尾部丢失：backend-core broker 在收到 `ssh.data` 后必须把 stdout/stderr 写入并 flush，再处理 `ssh.exit`，否则短命令可能 rc=0 但最后一段 stdout 没到调用端。第三个坑是 session 释放：`ssh.exit`、错误和超时路径都必须释放 claimed channel，避免下一批并发请求看到假性的池耗尽。第四个坑是 core/provider 池状态漂移：如果 provider 通过控制 WebSocket 返回 `host_ssh_error` 且提示 `requested ssh tcp data channel is not ready`，说明 core 侧 claim 到的 channel 已经不被 provider 认可，backend-core 必须 drop 该 `providerId + dataChannelId`，不能把它 release 回 idle pool 后继续重复 claim。

@@ -162,7 +162,7 @@ backend-core 可以通过真实 WebSocket 调度向在线 provider 下发 `provi

 `bun scripts/cli.ts provider triage <PROVIDER_ID>` 是 provider 运行状态的只读多信号裁决入口。输出必须包含 `decision`、`retryable`、`healthyScopes`、`failedScopes`、`degradedScopes`、`blockingDisposition`、`rationale`、`signals` 和 `recommendedCrossChecks`。`decision` 的长期语义是：`global-offline` 表示 provider heartbeat、Host SSH、k3s 或 scheduler 等多个独立关键面同时失败且没有健康交叉证据；`service-degraded` 表示 registry、service proxy 或单个用户服务局部退化但仍存在 provider 级健康信号；`retryable-transient` 表示单次 runner-local、SSH、proxy 或 API timeout 证据不足，应重试或补交叉验证；`healthy` 表示未观察到失败或退化信号。

-`recommendedCrossChecks` 必须保留 argv 形态的 Host SSH 自检：`trans <PROVIDER_ID> argv true`。这条命令用于证明非交互维护桥仍可用；如果自由 ssh-like 形态出现 timeout、`kex_exchange_identification` 或 `Connection closed by remote host`，应先按 CLI 输出的 `UNIDESK_SSH_HINT` 改用 `trans D601 argv bash -lc '<command>'` 复测，再结合 `provider triage` 判断是否真是 provider 级故障。
+`recommendedCrossChecks` 必须保留 argv 形态的 Host SSH 自检：`trans <PROVIDER_ID> argv true`。这条命令用于证明非交互维护桥仍可用；如果自由 ssh-like 形态出现 timeout、`kex_exchange_identification` 或 `Connection closed by remote host`，应先按 CLI 输出的 `UNIDESK_SSH_HINT` 改用 `trans D601 sh -- '<command>'` 或 `trans D601 bash -- '<bash command>'` 复测，再结合 `provider triage` 判断是否真是 provider 级故障。

 D601 这类长期 WSL provider 不得因为单一路径失败被直接写成全局离线。典型局部退化包括 artifact registry 的 `unidesk-artifact-registry.service` inactive，但 registry container 仍 running、listener 仍绑定 loopback、`http://127.0.0.1:5000/v2/` 返回 200；这种状态应在 registry scope 内显示 degraded，并在 provider triage 中落到 `decision=service-degraded`，只提示修复 systemd drift，不阻断所有 D601 上的 Code Queue、k3sctl-adapter 或业务 API 判断。

@@ -186,6 +186,6 @@ WSL provider 需要调用 Windows-only 工具链时，优先在 WSL 用户的 `~

 维护桥通过真实 WebSocket dispatch 暴露为 `host.ssh` 命令。默认 payload 使用 `mode: "probe"`，远端只执行一个短命令并返回 `UNIDESK_SSH_TEST user=... host=... bridge=host.ssh cwd=...`；需要人工诊断时可以显式使用 `mode: "exec"` 与 `command` 字段执行有界命令。所有 `host.ssh` 执行都必须有超时，stdout/stderr 在 task result 中截断展示；自动升级和普通任务仍必须使用 Docker socket 与 `provider.upgrade`，不得把 WSL SSH 维护桥当成调度通道。

-面向人的终端入口是 `trans <PROVIDER_ID> [ssh-like args...]`。无后续参数时打开远端登录 shell，有后续参数时执行远端命令并返回远端 exit code；该入口的 client 侧仍连接 backend-core 内网 `/ws/ssh` broker，core 只用 provider WebSocket 下发 open/dispatch 控制消息，终端 stdin/stdout/stderr 数据面必须走 provider 主动连接 main server 的 `host.ssh.tcp-pool` TCP warm pool，不新增计算节点入站要求，也不保留旧 WebSocket 数据 fallback。传统 ssh 传输参数由 provider-gateway 环境变量统一控制，CLI 只负责把 Provider ID 后的远端命令和终端 stdin/stdout/stderr 透传过去。非交互远端命令优先使用 argv 入口：`trans D601 argv true`，或需要 shell 特性时使用 `trans D601 argv bash -lc '<command>'`。WSL 节点需要同时看清 Linux/WSL 与 Windows 两套 skill 时，使用 `trans <PROVIDER_ID> skills`，该命令只通过已建立的维护桥读取 `SKILL.md` 元数据，不要求 provider-gateway 新增业务 API。
+面向人的终端入口是 `trans <PROVIDER_ID> [ssh-like args...]`。无后续参数时打开远端登录 shell，有后续参数时执行远端命令并返回远端 exit code；该入口的 client 侧仍连接 backend-core 内网 `/ws/ssh` broker，core 只用 provider WebSocket 下发 open/dispatch 控制消息，终端 stdin/stdout/stderr 数据面必须走 provider 主动连接 main server 的 `host.ssh.tcp-pool` TCP warm pool，不新增计算节点入站要求，也不保留旧 WebSocket 数据 fallback。传统 ssh 传输参数由 provider-gateway 环境变量统一控制，CLI 只负责把 Provider ID 后的远端命令和终端 stdin/stdout/stderr 透传过去。非交互单进程远端命令优先使用 argv 入口：`trans D601 argv true`；需要 shell 特性时在 operation 位置显式写 `sh` 或 `bash`，例如 `trans D601 sh -- '<command>'` 或 `trans D601 bash -- '<bash command>'`。WSL 节点需要同时看清 Linux/WSL 与 Windows 两套 skill 时，使用 `trans <PROVIDER_ID> skills`，该命令只通过已建立的维护桥读取 `SKILL.md` 元数据，不要求 provider-gateway 新增业务 API。

 验证 WSL SSH 桥时，先在目标 WSL 中启动 sshd 并确保维护公钥写入目标用户的 `authorized_keys`，再确认目标 provider 注册 labels 中 `unideskCapabilities` 包含 `host.ssh`。运行 `bun scripts/cli.ts debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000` 后，结果应在 `debug task latest` 或前端任务历史中显示 `status: succeeded`、`probeLine` 含 `UNIDESK_SSH_TEST`、`exitCode: 0`，并且目标节点 labels 中 `hostSshKeyPresent` 为 true；随后运行 `trans <PROVIDER_ID> argv true` 验证非交互 argv 维护命令，再运行 `trans <PROVIDER_ID> hostname` 验证近似原生 ssh 的远端命令体验。在计算节点本机自测时，使用 remote CLI 透传同一组命令：`bun scripts/cli.ts --main-server-ip 74.48.78.17 debug health`、`bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000`、`bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh <PROVIDER_ID> argv true` 和 `bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh <PROVIDER_ID> hostname`；默认 remote CLI 走公网 frontend 登录态，不需要主 server SSH key。健康检查必须能看到该 Provider 在线、`hostSshConfigured=true`、`hostSshKeyPresent=true`、`hostSshTarget` 正确、`unideskCapabilities` 包含 `host.ssh`，probe 必须返回 `UNIDESK_SSH_TEST`，`ssh <PROVIDER_ID> argv true` 与 `ssh <PROVIDER_ID> hostname` 必须 exit code 为 0。如果 D518 这类 WSL 节点没有公网 SSH 入口，也必须通过这个 provider-gateway 自连维护桥完成验证，而不是要求主 server 直接连节点公网 22 端口；旧版 provider 未声明 `host.ssh` 时必须先升级 provider-gateway，否则 core 会拒绝 SSH 透传。
@@ -63,7 +63,7 @@
 **每次开始正式工作前（进入排程、回读 Todo Note、回写决策、安排时间块），秘书必须自己拉取一次最新北京时间作为基准时间。** 不能依赖对话首部日期、上下文里的"现在"、历史时间戳或上一次会话的"今天"推断；用户口述时间（如"现在是 14:57"）作为交叉验证，但不是基准。取时方式（2026-06-01 验证）：

 - **主 server 本地调用**（秘书本机场景）：`TZ='Asia/Shanghai' date '+%Y-%m-%d %H:%M:%S %Z'`，15:39 CST = 真实北京时间，无需走 ssh route。优点：低摩擦、单命令、无 dep。
- **跨节点/远端校时**（需要确认远端 host 时钟时）：`trans <route> script -- 'date "+%Y-%m-%d %H:%M:%S %Z"'`，route 写明定位（如 `G14` / `D601`）。
+- **跨节点/远端校时**（需要确认远端 host 时钟时）：`trans <route> sh -- 'date "+%Y-%m-%d %H:%M:%S %Z"'`，route 写明定位（如 `G14` / `D601`）。

 把结果在排程第一句明确告诉用户（"我现在看到的时间是 X,按这个重排"），让用户能立刻发现时间错位。这样做的原因：用户可能在两次对话之间休息或离开，秘书用缓存时间排程会与用户实际位置偏移，反馈格式"完成/未完成 + 卡点"对不上，日程序列累积错位。

@@ -32,13 +32,13 @@ trans <PROVIDER_ID>:win skills --limit 20

 ## Windows Long-Lived Process Detach

-`trans <PROVIDER_ID>:win cmd ...` 和 `trans <PROVIDER_ID> script -- powershell.exe ...` 适合短命令、只读探测和有界 skill 调用，不适合直接启动 Windows 长驻进程。Windows `cmd start`、`cmd /c ... &`、PowerShell `Start-Process -PassThru` 或带 stdout/stderr 重定向的子进程，仍可能被 provider-gateway/SSH broker 按子进程树或继承句柄等待；结果是远端进程其实已启动，但 `trans` 会持续占用 provider session，后续 D601/G14 高频调用被 provider session lock 串行排队。
+`trans <PROVIDER_ID>:win cmd ...` 和 `trans <PROVIDER_ID>:win ps ...` 适合短命令、只读探测和有界 skill 调用，不适合直接启动 Windows 长驻进程。Windows `cmd start`、`cmd /c ... &`、PowerShell `Start-Process -PassThru` 或带 stdout/stderr 重定向的子进程，仍可能被 provider-gateway/SSH broker 按子进程树或继承句柄等待；结果是远端进程其实已启动，但 `trans` 会持续占用 provider session，后续 D601/G14 高频调用被 provider session lock 串行排队。

 长驻 Windows 进程必须使用明确脱离当前 `trans` 会话的启动模型：

 - 优先把启动参数写入节点私有 profile 或 `.cmd`/`.ps1` 文件，再通过 Windows Task Scheduler、Windows Service、NSSM、PM2 Windows service 或同等 supervisor 启动。
 - 如果只是临时实验，使用 `schtasks /Create ... /TR "<cmd file>" /SC ONCE ... /F` 后再 `schtasks /Run ...`，并把 stdout/stderr 写入固定日志文件；验证用独立短命令读取 `tasklist`、`Get-CimInstance Win32_Process`、日志尾部和服务健康。
- 不要在 `D601:win cmd` 内用 `start` 或 `Start-Process` 直接启动 `hwlab-gateway`、串口 monitor、Codex app-server、Keil job watcher 等长驻进程；如果误用了并导致 `trans` 卡住，应先停止对应 Windows PID 或本地被卡住的 trans/tran broker 进程，再改为 detached supervisor。
+- 不要在 `D601:win cmd` / `D601:win ps` 内用 `start` 或 `Start-Process` 直接启动 `hwlab-gateway`、串口 monitor、Codex app-server、Keil job watcher 等长驻进程；如果误用了并导致 `trans` 卡住，应先停止对应 Windows PID 或本地被卡住的 trans/tran broker 进程，再改为 detached supervisor。
 - 启动命令必须避免从 WSL UNC cwd 进入 Windows cmd；长驻进程的 `cwd` 应是 Windows 盘符路径，例如 `F:\Work\HWLAB`，日志也写到同一节点私有 `.state` 或 skill state 目录。
 - 长驻进程的验收标准不是“启动命令返回”，而是独立短命令能看到 PID、日志首行、health endpoint 或 cloud-side session/resource/capability 已注册。