docs: record AgentRun lane policy closeout

2026-06-25 03:49:46 +00:00
parent dc7bdd2c3c
commit 681b9e3292
2 changed files with 5 additions and 4 deletions
@@ -106,9 +106,9 @@ YAML-only lane 的 `trigger-current` 会先确保目标 source workspace/branch

 Provider credential Secret 的 `auth.json` 和 `config.toml` 也必须按 lane 的 YAML `sourceRef` 下发，不能把指挥机全局 Codex 配置当成所有 lane 的运行真相。HWLAB 通过 D601 `agentrun-v02` 使用 Codex profile 时，`config.toml` 应只携带该 lane 需要的 Codex CLI runtime options，例如 model、reasoning、context window、auto compact、storage 和 network 相关键；除非对应 `auth.json` / API key source 也由同一 lane 明确拥有并已验证，否则不要在 lane config 中覆盖 provider endpoint、`base_url`、`model_provider` 或其他 endpoint 绑定。常见回归有两类：同步到 runner 的 config 缺少 `model_context_window` / `model_auto_compact_token_limit`，导致多轮 tool/webSearch 后报 context-window failure；或者为了补参数误加不匹配的 provider endpoint，导致 provider auth failure。修复必须走 `agentrun control-plane secret-sync --node <node> --lane <lane>` 的 dry-run/confirm，再用 `restart` 生效，并通过 HWLAB `hwlab-cli client agent send|trace|result` 原入口验证；不要从 Kubernetes Secret 反解配置内容或在 issue/trace 中打印 payload。

-AgentRun resource/session client policy 也由 `config/agentrun.yaml` 声明。`client.sessionPolicy` 是 `agentrun send session/...` 和相关 session payload 生成的默认 `tenantId`、`projectId`、`providerId`、`backendProfile`、`workspaceRef` 和 execution policy 来源；lane `secrets[].providerCredential.profile` 声明 provider credential Secret 归属，UniDesk CLI 只按 YAML 聚合 Secret name/key，不再用代码拼接 provider Secret 名称。只读入口 `bun scripts/cli.ts agentrun explain session-policy` 用于查看当前默认 lane、session policy、实际 executionPolicy payload 和 provider credential binding 来源；输出只能包含 Secret metadata、key 名和 `valuesPrinted=false`，不得打印 Secret value。
+AgentRun resource/session client policy 也由 `config/agentrun.yaml` 声明。`client.sessionPolicy` 是未显式选择 node/lane 时 `agentrun send session/...` 和相关 session payload 生成的默认 `tenantId`、`projectId`、`providerId`、`backendProfile`、`workspaceRef` 和 execution policy 来源；显式 `--node <node> --lane <lane>` 后，`explain session-policy`、`send session`、resource primitives 和 AipodSpec render 都必须改用目标 lane 的 YAML 事实。lane `secrets[].providerCredential.profile` 声明 provider credential Secret 归属，UniDesk CLI 只按 YAML 聚合 Secret name/key，不再用代码拼接 provider Secret 名称。只读入口 `bun scripts/cli.ts agentrun explain session-policy` 用于查看选中目标 lane、policy 来源、实际 executionPolicy payload 和 provider credential binding 来源；输出只能包含 Secret metadata、key 名和 `valuesPrinted=false`，不得打印 Secret value。

-非默认 lane 的 session follow-up 必须证明 `send session` 使用的是选中 node/lane 的 run policy。使用短命令形态前，先用 `agentrun explain session-policy --node <node> --lane <lane>` 或等价 dry-run/describe 路径确认 `backendProfile`、`providerId`、`workspaceRef`、execution policy 和 provider credential SecretRef 都来自目标 lane；如果渲染结果回退到全局默认 lane，改用从 lane YAML 明确渲染的 JSON send body，并把 lane policy 问题登记到对应 issue。不得通过手工创建默认 lane Secret、复制凭据或修改 runtime namespace 来掩盖 policy 选错的问题。
+非默认 lane 的 session follow-up 必须证明 `send session` 使用的是选中 node/lane 的 run policy。使用短命令形态前，先用 `agentrun explain session-policy --node <node> --lane <lane> [--backend-profile <profile>]` 或等价 dry-run/describe 路径确认 `backendProfile`、`providerId`、`workspaceRef`、execution policy 和 provider credential SecretRef 都来自目标 lane；`--prompt-stdin` 短命令形态和 `--json-stdin -o json` 显式 JSON 形态应披露同一份 `sessionPolicy` 摘要。渲染结果回退到全局默认 lane、显示错误的 default lane，或短命令与 JSON body 使用不同 policy，都是 lane policy 缺陷，应修复 YAML 目标解析或 CLI 渲染；不得通过手工创建默认 lane Secret、复制凭据、改写 JSON body 或修改 runtime namespace 来掩盖 policy 选错的问题。

 `cleanup-runners` 是 AgentRun runtime runner retention 入口，只清理 YAML 选中 lane 的 runtime namespace 中匹配 `deployment.runner.retention.selectors` 的 runner Job/Pod。runner 上限、最后活跃排序策略、active heartbeat 窗口、Job name prefix 和是否启用 age-based cleanup 都以 `config/agentrun.yaml` 为唯一真相；命令行不得覆盖这些数值。dry-run 必须披露清理前 runner Job 数、runner 非终态 Pod 数、按最后活跃时间排序的 inactive 候选、selected runner Job、manager facts 可用性和 active run 风险；confirm 默认只删除 selected runner Job，并重新统计清理后 runner Job/Pod 数。manager facts 不可用时，只允许清理终态或无活动 Pod 的安全候选，并保留风险字段，不能把 Kubernetes 创建时间冒充为完整最后活跃事实。

@@ -147,7 +147,7 @@ AgentRun `v0.1` 的指挥官任务面已经按 AgentRun issue #105 完成真实

 UniDesk 指挥官新任务入口固定使用 `bun scripts/cli.ts agentrun get|describe|events|logs|result|ack|cancel|dispatch|create|apply|steer|send` 资源原语。该入口是 render-only client：UniDesk 客户端保留 k8s 风格命令解析、human 表格、生命周期摘要、下一步命令、分页、`-o json|yaml` 稳定客户端 schema 和错误展示；AgentRun 服务端只提供稳定 RESTful API、鉴权和业务事实，不承载 UniDesk CLI 渲染。日常派单优先用 `agentrun create task --aipod Artificer --prompt-stdin` 或 `agentrun apply -f -` 的 quoted YAML/JSON heredoc/stdin 形式；已创建未运行任务用 `agentrun dispatch task/<taskId>` 派发；`--json-file`、`--prompt-file` 和 `--runner-json-file` 只是客户端输入来源，用于已审阅且可复用的受控文件。UniDesk 不实现 AgentRun queue 协议，也不把任务 double-write 回旧 Code Queue。

-使用 lane-scoped AipodSpec 派单前，必须通过 `get/describe aipodspec`、render 输出或首个 runner job 摘要确认 `backendProfile`、provider credential SecretRef、tool credential SecretRef 和 bundle/workspaceRef 都存在于选中 lane 的 YAML 事实中。若 runner Pod 出现 `FailedMount`，且缺失对象是渲染出的 SecretRef，应归为 AipodSpec/YAML 绑定问题并修正受控配置；不得在 runtime namespace 手工创建 legacy Secret 或把其他 lane 的 Secret 复制过去。
+使用 lane-scoped AipodSpec 派单前，必须通过 `get/describe aipodspec`、render 输出或首个 runner job 摘要确认 `backendProfile`、provider credential SecretRef、tool credential SecretRef 和 bundle/workspaceRef 都存在于选中 lane 的 YAML 事实中。D601/v02 这类非默认 lane 的 Artificer 默认装配应从 lane YAML 绑定真实存在的 provider credential 和 tool credential：GitHub PR token 用 `tool=github`、`purpose=github-pr`、Secret key/projection env `GH_TOKEN`；UniDesk 透传 token 用 `tool=unidesk-ssh`、`purpose=ssh-passthrough`、Secret key/projection env `UNIDESK_SSH_CLIENT_TOKEN`。`tool=github-ssh`、`sub2api` 或其他 legacy tool credential 只有在 YAML 明确声明完整 SecretRef、keys 和 projection 时才允许渲染。若 runner Pod 出现 `FailedMount`，且缺失对象是渲染出的 SecretRef，应归为 AipodSpec/YAML 绑定问题并修正受控配置；不得在 runtime namespace 手工创建 legacy Secret 或把其他 lane 的 Secret 复制过去。AipodSpec render 的默认输出也应是 bounded summary/table/drill-down；完整 render JSON 只在显式 `--full`、`--raw`、`-o json` 或机器消费路径展开，残余 dump 问题继续归 [#862](https://github.com/pikasTech/unidesk/issues/862) 跟踪。

 资源原语和旧兼容 group 的默认 transport 是直连 AgentRun REST API，配置来源是 UniDesk 自有 YAML `config/agentrun.yaml`。不带 `--node`/`--lane` 时按 YAML 的默认 manager `baseUrl` 访问；显式 `--node <node> --lane <lane>` 时按同一 YAML 选中 runtime lane，经 `lane-k8s-service-proxy` 进入 manager `internalBaseUrl`，并用 manager pod env 中声明的 API key metadata 发起请求；输出只披露 node/lane/namespace/baseUrl/auth env metadata 和 `valuesPrinted=false`，不得打印 key value。该模式用于 D601 `agentrun-v02` 等非默认 lane 的资源原语操作与证据采集，尤其是 `get/describe/events/logs/result`，不替代 `agentrun control-plane ...` 发布或运维控制。鉴权可以复用 `HWLAB_API_KEY` 的环境变量/固定文件发现风格，但不得依赖 HWLAB runtime、HWLAB backend-core、HWLAB frontend 代理或 SSH official CLI；多一层转发会增加故障面，不能作为正式路径。`--raw` 只披露直连 AgentRun REST envelope 和必要的 `transport=direct-http`、`clientRole=render-only`、`configPath`、`baseUrl`、auth source/redacted metadata，不打印 token value。`agentrun control-plane ...` 和 `git-mirror ...` 仍属于 G14 source/runtime 运维控制路径，可以继续使用 UniDesk SSH capture bridge；这些控制面路径不得反向成为 queue/session 资源原语的默认 transport。

@@ -117,6 +117,7 @@ PipelineRun 失败或长时间未完成时，先按定点 `control-plane status
 - `schedule list|get|runs|run|retry-run|delete|upsert-pgdata-backup` 管理 backend-core 定时任务和运行历史。`schedule list`、`schedule get`、`schedule runs --limit N` 和 `schedule runs <scheduleId> --limit N` 是只读观察入口；`schedule run`、`schedule retry-run`、`schedule delete` 和 `schedule upsert-pgdata-backup` 会触发运行或写入配置，生产恢复时必须有明确授权。`schedule runs --limit N` 是全局历史视图，返回 `scope=global` 和 `scheduleId=null`；`schedule runs <scheduleId> --limit N` 是指定 schedule 历史视图，返回 `scope=schedule` 和对应 `scheduleId`。CLI 必须拒绝 `schedule runs 50` 这类纯数字位置参数，并提示使用 `schedule runs --limit 50`，避免把空数组误判成“没有历史 run”。`schedule run <id> --wait-ms N` 触发同一 schedule，并且即使 wait 超时也必须返回 `newRunId` 和 `observeCommand`；`schedule retry-run <failedRunId>` 只接受 failed run，从原 run 反查 `scheduleId` 后重触发同一 schedule，并输出 `originalRunId`、`scheduleId`、`newRunId` 和 `observeCommand`。当 backend-core 目标容器缺失或只观察到 verify-only 容器时，schedule/microservice 命令必须以非零退出并返回 `failureKind=target-stack-not-running`、`runnerDisposition=infra-blocked`、`readOnlyCommands` 和 `authorizationRequiredForRecovery`，不得把 Docker 的 `No such container` 当成成功的空历史。
 - `codex deploy <commitId>` 是旧 Code Queue 兼容部署入口，已禁用以防止维护通道直连 D601 部署 Code Queue；当前 dev 自动化只做 `ci run-dev-e2e` smoke，不提供 Code Queue CD，详细规则见 `docs/reference/codex-deploy.md`。
 - `agentrun get|describe|events|logs|result|ack|cancel|dispatch|create|apply|send` 是当前指挥官新任务和 AgentRun session 控制入口。UniDesk CLI 是 render-only client：客户端保留 k8s 风格命令解析、human 表格、生命周期摘要、下一步命令、分页、`-o json|yaml` 稳定客户端 schema 和错误展示；AgentRun 服务端只提供稳定 RESTful API、鉴权和业务事实，不承载 UniDesk CLI 渲染。日常查看用 `get tasks --queue commander`、`describe task/<taskId>`、`events run/<runId>`、`logs session/<sessionId>`、`result run/<runId> --command <commandId>`；日常写入用 `create task --aipod Artificer --prompt-stdin`、`apply -f -`、`dispatch task/<taskId>`、`send session/<sessionId>`、`ack/cancel task|session/<id>`。用户级 CLI 取消 `turn` 和 `steer` 路径；`send session/<sessionId>` 是唯一 session follow-up 写入口，AgentRun 服务端按 durable session/run/command 状态自动决定内部 `steer` 或新 `turn`，dry-run 必须真实返回这个 decision 且不写状态。兼容 group `queue|runs|commands|runner|sessions|aipod-specs` 也走同一 direct HTTP transport，`--raw` 只披露直连 AgentRun REST envelope。
+- `agentrun explain session-policy --node <node> --lane <lane>`、`agentrun send session/<id> --node <node> --lane <lane> --dry-run` 和 `agentrun aipod-specs render <name> --node <node> --lane <lane>` 必须使用同一套目标 lane YAML 解析。非默认 lane 的短命令形态和 `--json-stdin -o json` 形态都应显示目标 `backendProfile`、`providerId`、`workspaceRef`、executionPolicy、provider credential SecretRef 和 tool credential SecretRef 摘要；Secret 输出只显示对象名、key 名、presence/fingerprint、projection 和 `valuesPrinted=false`。如果默认 human 输出触发 `/tmp/unidesk-cli-output` dump，应先把命令收敛为 summary/table/drill-down；机器完整输出留给显式 `--full`、`--raw`、`-o json` 或等价机器消费参数。`aipod-specs render` 的残余默认 dump 归 [#862](https://github.com/pikasTech/unidesk/issues/862) 跟踪。
 - `agentrun` 资源原语的默认 transport 是直连 AgentRun REST API，配置来源是 UniDesk 自有 YAML `config/agentrun.yaml`。不带 `--node`/`--lane` 时按 YAML 的默认 manager `baseUrl` 访问；显式 `--node <node> --lane <lane>` 时按同一 YAML 选中 runtime lane，经 `lane-k8s-service-proxy` 进入 manager `internalBaseUrl`，并用 manager pod env 中声明的 API key metadata 发起请求；输出只披露 node/lane/namespace/baseUrl/auth env metadata 和 `valuesPrinted=false`，不得打印 key value。该模式用于 D601 `agentrun-v02` 等非默认 lane 的资源原语操作与证据采集，尤其是 `get/describe/events/logs/result`，不替代 `agentrun control-plane ...` 发布或运维控制。鉴权可以复用 `HWLAB_API_KEY` 的环境变量/固定文件发现风格，但不得依赖 HWLAB runtime、HWLAB backend-core、HWLAB frontend 代理或 SSH official CLI；多一层转发会增加故障面，不能作为正式路径。`agentrun control-plane ...` 和 `git-mirror ...` 仍属于 G14 source/runtime 运维控制路径，可以继续使用 UniDesk SSH capture bridge；这些控制面路径不得反向成为 queue/session 资源原语的默认 transport。
 - `agentrun cancel ... --dry-run` 必须显示 `CancelLifecycle` 摘要：transport/authority、YAML lane、cascade scope、runner abort 窗口、cancel epoch 与 late-write fencing。取消策略来自 `config/agentrun.yaml` 的 `controlPlane.lanes.<lane>.deployment.runner.cancelLifecycle`；字段缺失或 lane 选择错误应暴露为配置错误，不得在 CLI、manifest 或服务里补隐式默认。操作非默认 lane 时先加 `--node <node> --lane <lane> --dry-run` 核对 policy，再移除 `--dry-run` 发起真实取消。
 - `agentrun control-plane expose --dry-run|--confirm` 按 `config/agentrun.yaml` 维护 AgentRun 公网 HTTPS 入口，模式与 Sub2API 暴露一致：G14 AgentRun runtime 通过 frpc 出到 master `127.0.0.1:<remotePort>`，master Caddy 提供 `https://agentrun.74-48-78-17.nip.io/`。该命令只补 master `frps` allow port 和 Caddy vhost；G14 frpc Deployment/ConfigMap 必须由 AgentRun `deploy/deploy.json` + GitOps render 管理，不能在 UniDesk 侧手写 Kubernetes manifest。
@@ -160,7 +161,7 @@ UniDesk/HWLAB Web 开发、Playwright wrapper、`trans <route> playwright`、HWL

 每条命令的最外层 JSON 包含 `ok`、`command` 和 `data` 或 `error`。失败时 CLI 设置非零退出码，但仍然输出 JSON 错误对象；错误对象应包含 `name`、`message` 和可用的 `stack`。

-诊断命令默认采用渐进披露：`server logs`、`job list/status`、`codex task/trace/output`、`microservice health code-queue` 和 `microservice proxy` 都必须有默认条数、字节数或文本预览上限；用户显式传 `--limit`、`--tail-bytes`、`--full-text`、`--raw` 或 `--full` 才扩大单次输出。CLI stdout 遇到下游 pipe 关闭的 `EPIPE` 必须安静退出，不得打印 Bun stack trace。
+诊断命令默认采用渐进披露：`server logs`、`job list/status`、`codex task/trace/output`、`microservice health code-queue`、`microservice proxy` 和 AgentRun control-plane/resource primitive 都必须有默认条数、字节数、表格或文本预览上限；用户显式传 `--limit`、`--tail-bytes`、`--full-text`、`--raw`、`--full`、`-o json` 或等价机器消费参数才扩大单次输出。AgentRun `control-plane plan|refresh|cleanup-runners|trigger-current` 默认输出短摘要、关键字段和下一步命令；`describe task -o json` 默认仍是 compact client schema，完整资源用 `--full -o json`；`result --raw` 属于显式 raw 路径，可以触发 dump 兜底。CLI stdout 遇到下游 pipe 关闭的 `EPIPE` 必须安静退出，不得打印 Bun stack trace。

 `microservice proxy` 是面向人工验证和受控调试的私有后端入口。默认 method 为 GET；使用 `--body-json JSON`、`--body-file path` 或 `--body-stdin` 时默认 method 切换为 POST，也可显式加 `--method POST|PUT|PATCH|DELETE`，但 GET/HEAD 不允许携带请求体。所有请求仍受 config 中的 `allowedMethods` 和 `allowedPathPrefixes` 限制。为了避免 Pipeline snapshot 这类超大业务 JSON 造成 CLI 输出爆炸，响应 body 超过默认阈值时会返回 `bodyOmitted=true`、`bodyPreview`、`bodyBytes` 和 `rawHint`；`--raw` 仍受默认硬限额保护，需要完整 body 时显式添加 `--raw --full`，或用 `--max-body-bytes <N>` 调整预览阈值。正式 frontend 展示仍应优先使用业务控件和 `__unideskArrayLimit` 这类展示级裁剪参数，而不是默认倾倒完整 JSON。