pikasTech-unidesk/docs/reference/agentrun.md

# AgentRun 开发与运维参考

本文只记录 UniDesk 侧对独立仓库 `pikasTech/agentrun` 的开发与运维约束。AgentRun 作为 HWLAB Agent 编排执行基础设施时，需求规格正文由 UniDesk OA 管理，入口是 [PJ2026-0102 Agent编排](../../project-management/PJ2026-01/specs/PJ2026-0102-agent-orchestration.md)。AgentRun 仓库内 `docs/reference/spec-v01-*.md` 和 `docs/reference/architecture.md` 只保留到 OA 规格的交叉引用 stub；实现细节、源码组织和仓库本地运行说明仍维护在 AgentRun 仓库自身。

## 仓库与 Worktree

AgentRun 唯一长期仓库是：

```text
git@github.com:pikasTech/agentrun.git
```

AgentRun 当前 `v0.1` 固定 source worktree 是：

```text
G14:/root/agentrun-v01
```

该目录必须固定使用 `v0.1` 分支，`origin` 必须是 `git@github.com:pikasTech/agentrun.git`，并保持 clean。任何明确面向 UniDesk/HWLAB 基础 Code Agent 调用服务 `v0.1` 的开发、文档修改、部署观察或恢复中断后，先通过 UniDesk SSH 透传执行：

```bash
trans G14:/root/agentrun-v01 sh -- 'pwd; git status --short --branch; git remote -v'
```

期望状态：

- 当前路径是 `/root/agentrun-v01`；
- 分支是 `v0.1...origin/v0.1`；
- `origin` 是 `git@github.com:pikasTech/agentrun.git`；
- 固定 source worktree clean。

如果固定 source worktree 缺失、dirty、分支不对或 remote 不对，必须先修正，再继续工作。不得把 `/root/agentrun` 主线历史目录、`/root/unidesk`、`/root/hwlab`、D601 workspace、临时 clone、runner checkout、pod 内副本或 master-server 副本当作 AgentRun `v0.1` source truth。

## Worktree 规则

固定 source worktree 只用于预检、fetch、worktree 管理和最终同步。常规 AgentRun `v0.1` 功能、文档和部署修改必须使用独立 worktree：

```text
G14:/root/agentrun-v01/.worktree/{pr_branch}
```

`v0.1` worktree 必须从最新 `origin/v0.1` 创建。任务分支只覆盖当前变更，提交时只提交当前任务相关文件。不要把 `/root/agentrun-v01` 根目录当作并行任务 scratch 区。

## 文档落库规则

AgentRun 仓库内长期参考、`spec-v01-*` 和 `architecture.md` 交叉引用 stub 变更不创建 PR。完成本地审查后，必须直接提交并推送到对应目标分支，例如 `origin/v0.1`。需求规格正文变更落到 UniDesk OA 的 `project-management/PJ2026-01/specs/`，不要在 AgentRun repo 另维护一份正文。过程计划、阶段证据、验收结果和阻塞点写入对应 GitHub issue 评论区，不能用文档 PR 代替直接落库。

## 部署目标

AgentRun 废弃旧 `dev/prod` 运行口径。`v0.1` 固定部署目标是 G14 原生 k3s namespace：

```text
G14:k3s namespace agentrun-v01
```

所有 k3s 操作必须使用 UniDesk route 语法：

```bash
trans G14:k3s kubectl get pods -n agentrun-v01
```

不得把临时 NodePort、host port、pod IP、provider-gateway 业务 HTTP proxy 或一次性 port-forward 固化为 AgentRun 部署路径。任何公网入口、UniDesk/HWLAB 集成入口或跨服务访问路径，都必须先通过 AgentRun 仓库内经过审查的变更引入；UniDesk 只在后续记录对应运维入口。

## 受控 CI/CD 入口

AgentRun 控制面写操作必须通过 UniDesk 高层 CLI 执行。无 `--node/--lane` 的控制面命令不再表示代码里的固定 G14/v0.1 常量，而是解析 `config/agentrun.yaml.controlPlane.default`；显式 `--node <node> --lane <lane>` 仍可选择其他 lane。新增或迁移 lane 必须从 `config/agentrun.yaml` 解析目标，不得从 AgentRun service repo 的 `deploy.json` 读取部署真相。

```bash
bun scripts/cli.ts agentrun control-plane status
bun scripts/cli.ts agentrun control-plane trigger-current --dry-run
bun scripts/cli.ts agentrun control-plane trigger-current --confirm
bun scripts/cli.ts agentrun control-plane refresh --dry-run
bun scripts/cli.ts agentrun control-plane refresh --confirm
bun scripts/cli.ts agentrun control-plane cleanup-runners --node D601 --lane v02 --dry-run
bun scripts/cli.ts agentrun control-plane cleanup-runners --node D601 --lane v02 --confirm
bun scripts/cli.ts agentrun control-plane cleanup-runs --min-age-minutes 30 --limit 200 --dry-run
bun scripts/cli.ts agentrun control-plane cleanup-runs --min-age-minutes 30 --limit 200 --confirm
bun scripts/cli.ts agentrun control-plane cleanup-released-pvs --limit 200 --dry-run
bun scripts/cli.ts agentrun control-plane cleanup-released-pvs --limit 200 --confirm
```

YAML-only lane 的标准入口是：

```bash
bun scripts/cli.ts agentrun control-plane plan --node D601 --lane v02
bun scripts/cli.ts agentrun control-plane apply --node D601 --lane v02 --dry-run
bun scripts/cli.ts agentrun control-plane apply --node D601 --lane v02 --confirm
bun scripts/cli.ts agentrun control-plane secret-sync --node D601 --lane v02 --dry-run
bun scripts/cli.ts agentrun control-plane secret-sync --node D601 --lane v02 --confirm
bun scripts/cli.ts agentrun control-plane restart --node D601 --lane v02 --dry-run
bun scripts/cli.ts agentrun control-plane restart --node D601 --lane v02 --confirm
bun scripts/cli.ts agentrun control-plane trigger-current --node D601 --lane v02 --dry-run
bun scripts/cli.ts agentrun control-plane trigger-current --node D601 --lane v02 --confirm
bun scripts/cli.ts agentrun control-plane status --node D601 --lane v02 --full
```

`status` 只读观察 YAML 选中 lane 的 source workspace 当前 commit、对应 PipelineRun、GitOps latest、Argo Application、runtime workload、manager source commit 和 git mirror 摘要，并报告 Argo revision 是否对齐该 lane 的 GitOps latest。默认输出是 compact commander 视图：`target` 只保留 node/lane/source/runtime/CI/GitOps/git-mirror/database 摘要，关键结论在 `summary` 和 `alignment`，成功 probe 的 stdout/stderr tail、完整 YAML target、原始 `source`、`runtime`、`gitMirror` payload 默认省略；需要完整展开时使用返回的 `disclosure.fullCommand` 或显式加 `--full`，需要原始调试视图时加 `--raw`。`status` 额外支持 `--pipeline-run <name>` 与 `--source-commit <sha>` 定点查询；`--pipeline-run` 会读取 PipelineRun `revision` 参数作为 pinned source commit，并在 `alignment.branchDrift` / `summary.branchDrift` 中同时披露当前 branch tip、目标 source commit、PipelineRun source commit、是否已被当前 branch supersede 以及 `triggerLatest` 下一步。`status` 会向 stderr 输出 `agentrun.control-plane.status.progress` 阶段事件，覆盖 `source`、`runtime` 和 `git-mirror`，避免长时间聚合时无可见进展。`trigger-current` 会先把 YAML 声明的 source worktree 快进到 lane source branch，再以当前 commit 创建 commit-pinned PipelineRun；同名 PipelineRun 正在运行或已经成功时必须拒绝重复触发，只允许在失败态或不存在时创建。该命令只提交 CI/CD 工作，不等待完整 PipelineRun 或 rollout 完成，后续用 `job status` 和 `status --pipeline-run <name>` 轮询。`refresh` 只对 YAML 声明的 Argo Application 执行 hard refresh，用于 GitOps promotion 已完成但 Argo 仍停留旧 revision 时的受控同步入口；它不直接 patch runtime workload。

YAML-only lane 的 `trigger-current --confirm` 是受控长流程入口；source bootstrap、image build、GitOps publish、git-mirror sync 和 PipelineRun 创建必须拆成短提交与状态轮询，不得把 clone、build、push 或长时间 polling 放进一个顶层 `trans` 长连接。`trigger-current` 返回异步 job 时，先用 `bun scripts/cli.ts job status <jobId> --tail-bytes 12000` 观察 `agentrun-yaml-lane-trigger` progress，再用 `agentrun control-plane status --node <node> --lane <lane> --pipeline-run <name>` 观察 Tekton、GitOps 和 Argo 对齐。后台步骤的 `status` 与 `ok` 必须共同判定，`status=succeeded` 但 `ok=false` 是终态失败，不能继续轮询到超时。GitOps publish 必须使用隔离临时 clone/worktree，不能切换或污染 YAML 声明的固定 source workspace；如果历史失败 publish 已让固定 workspace dirty、detached 或停在 GitOps 分支，只清理已知生成产物/失败发布残留并恢复到 lane source branch 后再重试。

AgentRun YAML-only lane 发布收口必须以当前 source branch truth 为准。`trigger-current` 期间若 lane source branch 被并行 PR 推进，`status --pipeline-run <name>` 会通过 `branchDrift.sourceBranchAdvanced=true` / `targetSupersededByCurrentBranch=true` 标记该 PipelineRun 已不是当前 branch tip；closeout 必须确认最新 tip 包含本次修复，再按最新 tip 重新 `trigger-current`，最后用最新 PipelineRun 的 `status` 证明 `aligned=true`、`blockers=[]`、`argoSyncedToGitops=true` 和 `managerSourceMatchesExpected=true`。不要用已经被更新 source supersede 的中间 PipelineRun 作为最终 closeout。

`trigger-current` 的 source bootstrap 可能为 commit-pinned 构建短暂检出精确 commit；confirmed `trigger-current` 收尾会尝试把 YAML 声明的固定 source workspace 恢复到 lane source branch，并在返回 JSON 中披露 `sourceWorkspaceRestore`。如果返回 `source-worktree-restore-failed` warning、workspace dirty，或后续 `status` 仍显示 `summary.source.workspaceDetached=true`，先按 `sourceWorkspaceRestore.failureKind` 修复固定 workspace，再继续创建 worktree、触发发布或写 closeout；不要把 detached 固定 workspace 当作下一轮开发或部署的 source truth。

YAML-only lane 的 `trigger-current` 会先确保目标 source workspace/branch 存在，再从 UniDesk YAML 声明的 image build、GitOps branch/path、runtime namespace、Secret、数据库和 manager env 渲染 artifact catalog 与 GitOps desired state。该路径会删除新 lane source branch 中的 `deploy/deploy.json`，因为部署真相已经迁入 UniDesk YAML；旧 `v0.1` branch 中历史文件只作为迁移前遗留产物存在，不能作为新 lane 的事实来源。Secret export 格式或外部数据库连接参数变化时，先用 `platform-db postgres export-secrets --confirm` 物化本地 Secret source，再用 `agentrun control-plane secret-sync --node <node> --lane <lane> --confirm` 下发，最后用 `agentrun control-plane restart --node <node> --lane <lane> --confirm` 让 manager Deployment 通过 rollout 读取新 Secret；不要手工删除 Pod 或直接 patch Secret。

Provider credential Secret 的 `auth.json` 和 `config.toml` 也必须按 lane 的 YAML `sourceRef` 下发，不能把指挥机全局 Codex 配置当成所有 lane 的运行真相。HWLAB 通过 D601 `agentrun-v02` 使用 Codex profile 时，`config.toml` 应只携带该 lane 需要的 Codex CLI runtime options，例如 model、reasoning、context window、auto compact、storage 和 network 相关键；除非对应 `auth.json` / API key source 也由同一 lane 明确拥有并已验证，否则不要在 lane config 中覆盖 provider endpoint、`base_url`、`model_provider` 或其他 endpoint 绑定。常见回归有两类：同步到 runner 的 config 缺少 `model_context_window` / `model_auto_compact_token_limit`，导致多轮 tool/webSearch 后报 context-window failure；或者为了补参数误加不匹配的 provider endpoint，导致 provider auth failure。修复必须走 `agentrun control-plane secret-sync --node <node> --lane <lane>` 的 dry-run/confirm，再用 `restart` 生效，并通过 HWLAB `hwlab-cli client agent send|trace|result` 原入口验证；不要从 Kubernetes Secret 反解配置内容或在 issue/trace 中打印 payload。

AgentRun resource/session client policy 也由 `config/agentrun.yaml` 声明。`client.sessionPolicy` 是未显式选择 node/lane 时 `agentrun send session/...` 和相关 session payload 生成的默认 `tenantId`、`projectId`、`providerId`、`backendProfile`、`workspaceRef` 和 execution policy 来源；显式 `--node <node> --lane <lane>` 后，`explain session-policy`、`send session`、resource primitives 和 AipodSpec render 都必须改用目标 lane 的 YAML 事实。lane `secrets[].providerCredential.profile` 声明 provider credential Secret 归属，UniDesk CLI 只按 YAML 聚合 Secret name/key，不再用代码拼接 provider Secret 名称。只读入口 `bun scripts/cli.ts agentrun explain session-policy` 用于查看选中目标 lane、policy 来源、实际 executionPolicy payload 和 provider credential binding 来源；输出只能包含 Secret metadata、key 名和 `valuesPrinted=false`，不得打印 Secret value。

非默认 lane 的 session follow-up 必须证明 `send session` 使用的是选中 node/lane 的 run policy。使用短命令形态前，先用 `agentrun explain session-policy --node <node> --lane <lane> [--backend-profile <profile>]` 或等价 dry-run/describe 路径确认 `backendProfile`、`providerId`、`workspaceRef`、execution policy 和 provider credential SecretRef 都来自目标 lane；`--prompt-stdin` 短命令形态和 `--json-stdin -o json` 显式 JSON 形态应披露同一份 `sessionPolicy` 摘要。渲染结果回退到全局默认 lane、显示错误的 default lane，或短命令与 JSON body 使用不同 policy，都是 lane policy 缺陷，应修复 YAML 目标解析或 CLI 渲染；不得通过手工创建默认 lane Secret、复制凭据、改写 JSON body 或修改 runtime namespace 来掩盖 policy 选错的问题。

`cleanup-runners` 是 AgentRun runtime runner retention 入口，只清理 YAML 选中 lane 的 runtime namespace 中匹配 `deployment.runner.retention.selectors` 的 runner Job/Pod。runner 上限、最后活跃排序策略、active heartbeat 窗口、Job name prefix 和是否启用 age-based cleanup 都以 `config/agentrun.yaml` 为唯一真相；命令行不得覆盖这些数值。dry-run 必须披露清理前 runner Job 数、runner 非终态 Pod 数、按最后活跃时间排序的 inactive 候选、selected runner Job、manager facts 可用性和 active run 风险；confirm 默认只删除 selected runner Job，并重新统计清理后 runner Job/Pod 数。manager facts 不可用时，只允许清理终态或无活动 Pod 的安全候选，并保留风险字段，不能把 Kubernetes 创建时间冒充为完整最后活跃事实。

`cleanup-runners --force-active` 只用于 operator 已明确决定“强杀 runner pod”的资源恢复场景，例如 runner Job 顶满单节点 pod 配额并阻塞 git-mirror、CI/CD 或其他控制面调度。使用前仍必须先执行同参 `--dry-run`，确认 `criteria.forceActive=true`、命中的 namespace/selector、`selectedRunnerJobs` 和预期一致；`--confirm` 会删除所有匹配 runner Job，包括 protected active runner，并会中断对应 run、command 或 session。该开关不得作为日常 retention、静默自愈或 over-limit 默认策略；需要强杀时也必须走这个受控入口，禁止回退到裸 `kubectl delete pod/job`。

`cleanup-runs` 是 AgentRun `v0.1` 完成态 CI workspace retention 入口，只清理 `agentrun-ci` namespace 中超过 `--min-age-minutes` 的 `agentrun-v01-ci-*` PipelineRun，通过 Tekton ownerRef 释放临时 workspace PVC。dry-run 必须披露候选 PipelineRun、owned PVC、active mount 保护、local-path 实际估算 bytes 和 confirm 命令。默认保护最新完成的 PipelineRun，保留当前 CI/CD 状态证据。`cleanup-released-pvs` 是二次回收入口，只处理 `agentrun-ci`、`local-path`、`Delete` reclaim policy 的 `Released` PV；它不触碰 AgentRun runtime namespace、业务 PVC、Secret、registry storage 或 GitOps desired state。磁盘治理和 G14 safe-stop 规则见 `docs/reference/gc.md`。

Runner 持久化、空闲退出窗口和 session PVC 相关运维参数的唯一归属是 UniDesk `config/agentrun.yaml` 中目标 lane 的 `deployment.runner.*` 配置；不要在 HWLAB 仓库新增运维 YAML，也不要让 AgentRun service repo 的 `deploy.json` 或 Kubernetes runtime 状态反向成为配置真相。Manager Deployment 需要把 YAML 渲染为 `AGENTRUN_RUNNER_IDLE_TIMEOUT_MS` 等 manager env，但这只能证明控制面配置已到 manager；关闭 HWLAB/AgentRun 长会话或 runner 持久化问题前，还必须通过原入口创建新 turn，并检查新建 runner Job 的 env、session PVC 和 `AGENTRUN_SOURCE_COMMIT`。AgentRun manager 内所有创建 runner Job 的路径，包括 `/api/v1/runs/:runId/runner-jobs`、session send 和 queue dispatch，都必须复用同一 runner defaults helper；新增 `deployment.runner.*` 字段时禁止在某条 route 手写一份 defaults。

涉及 AgentRun runner egress、`transientEnv` 或 Secret 不泄露的 closeout，必须用真实 `create/apply/send` 资源原语触发目标 lane 的 runner Job，再通过 `describe runnerjob/...`、`events run/...`、`logs session/...` 或必要的兼容 bridge 检查 runner job response、event/trace 和 Kubernetes Pod spec。Runner egress proxy 的部署真相是 `config/agentrun.yaml` 中对应 lane 的 `deployment.runner.egressProxyUrl` 与 `deployment.runner.noProxyExtra`；manager Deployment 必须把它们暴露为 `AGENTRUN_RUNNER_EGRESS_PROXY_URL` 与 `AGENTRUN_RUNNER_NO_PROXY_EXTRA`，实际验收还必须确认新建 runner Job Pod 继承了对应 `HTTP_PROXY`、`HTTPS_PROXY`、`ALL_PROXY` 和 `NO_PROXY`，不能只看 manager env 或 plan 输出。通过证据应显示 proxy env 是否存在、`NO_PROXY` 是否包含 `hyueapi.com`/`.hyueapi.com`、短期 `HWLAB_API_KEY` 等 `transientEnv` 是否通过 per-job Secret 的 `valueFrom.secretKeyRef` 注入，以及 response/event 只输出 env name、Secret metadata 和 `valuesPrinted=false`。不得在 issue、trace 或 Pod spec 摘要中输出 Secret value。HWLAB-facing SecretRef 和 RuntimeAssembly 需求以 [Runtime装配](../../project-management/PJ2026-01/specs/PJ2026-010202-runtime-assembly.md) 与 [YAML运维](../../project-management/PJ2026-01/specs/PJ2026-010603-yaml-first-ops.md) 为权威；AgentRun 仓库 stub 只交叉引用这些 OA 规格。

通过 `g14-provider-egress-proxy.unidesk.svc.cluster.local:18789` 验证 `codeload.github.com` 时，必须同时确认 G14 runtime egress Service 有 ready endpoint。Service/DNS 存在但 Deployment `0/1`、Endpoint 只有 notReady address、Pod `ImagePullBackOff` 或 `ContainerStatusUnknown` 时，问题归为 UniDesk/G14 runtime egress 基础设施；不能把 runner 已注入 proxy env 后的 `connect refused` 归为 AgentRun 业务修复失败，也不能关闭要求“通过受控 proxy 成功访问 codeload”的 issue。

## UniDesk 边界

UniDesk 是 AgentRun 的综合分布式开发和运维中心。UniDesk 可以记录：

- AgentRun 的固定仓库、source worktree 和 worktree 规则；
- G14 预检、route 语法和远程操作入口；
- `v0.1` 固定 namespace 与后续版本 lane 规则；
- 部署观察、受控 rollout 和运维入口；
- UniDesk OA 定义公共契约后，UniDesk 与 HWLAB 如何接入。

UniDesk reference 不能作为 AgentRun repo 内部实现细节的事实来源：

- 源码目录、模块组织和本地调试命令；
- repo-owned helper、runner 启动脚本和 runbook 的参数细节；
- 只服务 AgentRun 内部开发的一次性排障步骤；
- 数据库 migration 文件内容、源码实现和内部测试夹具。

AgentRun 的产品边界、REST resource/API 语义、run/command/event/session/queue、backend adapter/profile、runtime assembly、发布流水、源码同步和 HWLAB 接入等面向 HWLAB/UniDesk 的需求规格维护在 UniDesk OA。AgentRun repo 的 reference 可以说明如何运行、调试和实现这些能力，但不能把需求规格正文重新写回 repo。

## AgentRun Queue 与旧 Code Queue 边界

AgentRun `v0.1` 的指挥官任务面已经按 AgentRun issue #105 完成真实运行面验收，可作为新任务派发、commander queue 观察、events/logs/result、steer/send、ack 和 cancel 的 AgentRun 侧标准路径。长期能力规格以 UniDesk OA 的 [队列会话](../../project-management/PJ2026-01/specs/PJ2026-010203-queue-session.md) 和 [AgentRun核心](../../project-management/PJ2026-01/specs/PJ2026-010201-agentrun-core.md) 为准；UniDesk 只记录该路径已经通过 G14 `agentrun-v01` 运行面和 `hy` profile + `gpt-5.5` 验证。

UniDesk 指挥官新任务入口固定使用 `bun scripts/cli.ts agentrun get|describe|events|logs|result|ack|cancel|dispatch|create|apply|steer|send` 资源原语。该入口是 render-only client：UniDesk 客户端保留 k8s 风格命令解析、human 表格、生命周期摘要、下一步命令、分页、`-o json|yaml` 稳定客户端 schema 和错误展示；AgentRun 服务端只提供稳定 RESTful API、鉴权和业务事实，不承载 UniDesk CLI 渲染。日常派单优先用 `agentrun create task --aipod Artificer --prompt-stdin` 或 `agentrun apply -f -` 的 quoted YAML/JSON heredoc/stdin 形式；已创建未运行任务用 `agentrun dispatch task/<taskId>` 派发；`--json-file`、`--prompt-file` 和 `--runner-json-file` 只是客户端输入来源，用于已审阅且可复用的受控文件。UniDesk 不实现 AgentRun queue 协议，也不把任务 double-write 回旧 Code Queue。

使用 lane-scoped AipodSpec 派单前，必须通过 `get/describe aipodspec`、render 输出或首个 runner job 摘要确认 `backendProfile`、provider credential SecretRef、tool credential SecretRef 和 bundle/workspaceRef 都存在于选中 lane 的 YAML 事实中。D601/v02 这类非默认 lane 的 Artificer 默认装配应从 lane YAML 绑定真实存在的 provider credential 和 tool credential：GitHub PR token 用 `tool=github`、`purpose=github-pr`、Secret key/projection env `GH_TOKEN`；UniDesk 透传 token 用 `tool=unidesk-ssh`、`purpose=ssh-passthrough`、Secret key/projection env `UNIDESK_SSH_CLIENT_TOKEN`。`tool=github-ssh`、`sub2api` 或其他 legacy tool credential 只有在 YAML 明确声明完整 SecretRef、keys 和 projection 时才允许渲染。若 runner Pod 出现 `FailedMount`，且缺失对象是渲染出的 SecretRef，应归为 AipodSpec/YAML 绑定问题并修正受控配置；不得在 runtime namespace 手工创建 legacy Secret 或把其他 lane 的 Secret 复制过去。AipodSpec render 的默认输出也应是 bounded summary/table/drill-down；完整 render JSON 只在显式 `--full`、`--raw`、`-o json` 或机器消费路径展开，残余 dump 问题继续归 [#862](https://github.com/pikasTech/unidesk/issues/862) 跟踪。

资源原语和旧兼容 group 的默认 transport 是直连 AgentRun REST API，配置来源是 UniDesk 自有 YAML `config/agentrun.yaml`。不带 `--node`/`--lane` 时按 YAML 的默认 manager `baseUrl` 访问；显式 `--node <node> --lane <lane>` 时按同一 YAML 选中 runtime lane，经 `lane-k8s-service-proxy` 进入 manager `internalBaseUrl`，并用 manager pod env 中声明的 API key metadata 发起请求；输出只披露 node/lane/namespace/baseUrl/auth env metadata 和 `valuesPrinted=false`，不得打印 key value。该模式用于 D601 `agentrun-v02` 等非默认 lane 的资源原语操作与证据采集，尤其是 `get/describe/events/logs/result`，不替代 `agentrun control-plane ...` 发布或运维控制。鉴权可以复用 `HWLAB_API_KEY` 的环境变量/固定文件发现风格，但不得依赖 HWLAB runtime、HWLAB backend-core、HWLAB frontend 代理或 SSH official CLI；多一层转发会增加故障面，不能作为正式路径。`--raw` 只披露直连 AgentRun REST envelope 和必要的 `transport=direct-http`、`clientRole=render-only`、`configPath`、`baseUrl`、auth source/redacted metadata，不打印 token value。`agentrun control-plane ...` 和 `git-mirror ...` 仍属于 G14 source/runtime 运维控制路径，可以继续使用 UniDesk SSH capture bridge；这些控制面路径不得反向成为 queue/session 资源原语的默认 transport。

AgentRun 公网 HTTPS 入口、FRP/Caddy edge、direct REST base URL 和鉴权来源都由 UniDesk `config/agentrun.yaml` 声明。YAML-only lane 不允许把这些部署选择写回 AgentRun source branch 的 `deploy/deploy.json`；AgentRun source repo 只保留应用代码、构建输入和 repo 内部实现文档。`bun scripts/cli.ts agentrun control-plane expose --confirm` 只负责按 UniDesk YAML 补 edge 侧 allow port 与 Caddy site，不在 AgentRun k3s 中创建 Ingress、NodePort、LoadBalancer、hostPort 或 HWLAB 转发层。

AgentRun Queue 任务如果需要调用 UniDesk 维护桥，例如 `trans` / `unidesk-ssh`，长期契约以 UniDesk OA 的 [Runtime装配](../../project-management/PJ2026-01/specs/PJ2026-010202-runtime-assembly.md) 和 [YAML运维](../../project-management/PJ2026-01/specs/PJ2026-010603-yaml-first-ops.md) 为准：调用方通过 `executionPolicy.secretScope.toolCredentials[].tool=unidesk-ssh` 请求 `UNIDESK_SSH_CLIENT_TOKEN` SecretRef；非敏感 endpoint 由 runner-job `transientEnv` 显式提供，或由 manager 受控默认值自动补齐。UniDesk bridge 提交 Queue payload 时不得在 prompt、payload 或 `transientEnv` 中携带 token，也不得使用 HWLAB runtime Web 入口冒充 UniDesk frontend。若 dispatcher 已正确请求 `unidesk-ssh` 但 trace 的 `runner-job-created.transientEnv.names` 没有 `UNIDESK_MAIN_SERVER_IP`、`UNIDESK_MAIN_SERVER_HOST` 或 `UNIDESK_FRONTEND_URL`，归为 AgentRun assembly 问题；若 endpoint env 已存在但 route denied/timeout，再按 UniDesk frontend/token scope 或 provider session 排查。

旧 UniDesk Code Queue 只保留历史归档、只读排障和残留旧任务停止入口。`codex submit/enqueue`、`codex steer`、`codex resume`、`codex queue create/merge`、`codex move`、旧 Web 提交表单、旧队列管理和旧 workdir 管理都必须返回冻结状态或禁用；`codex task/tasks/output/read/unread/queues` 可继续读取历史，`codex interrupt|cancel` 只用于停止残留旧任务。旧 Code Queue history 不迁移到 AgentRun，也不提供 adapter、legacy mode、fallback 或双写路径。

## AgentRun / HWLAB 协同职责边界

HWLAB 接入 AgentRun 时，必须先按公共契约和运行证据判断问题归属，再进入对应仓库修改。谁拥有缺失能力、错误语义或未修复行为，就改谁；不得为了让当前联调继续推进而在另一侧迁就、伪造语义、补观测替代实现，或把缺失能力包装成已完成。

AgentRun 负责共享 Agent 执行基础设施本身，包括 run/command/runner-job 生命周期、bundle 物化、cancel、trace/result 元语、backend adapter 事件语义、runner 环境传递、CLI 结果查询和 OA 规格中已经承诺的能力。若这些能力缺失或行为错误，必须回到 UniDesk OA 规格确认需求，再在 `pikasTech/agentrun` 的源码、自测、CI/CD 和 `agentrun-v01` 运行面中补齐；HWLAB 不应在渲染层、adapter 层或 prompt 中推断、补造 AgentRun 没有发出的事实。

HWLAB 负责自身产品和接入层，包括用户鉴权、Cloud Web/CLI 对外 API、conversation/session 归属、前端展示、device-pod 业务授权、HWLAB 到 AgentRun 的 adapter 映射，以及不改变外部 API 的内部调用切换。若 AgentRun 已按契约输出正确语义，而 HWLAB 消费、映射、渲染或业务路径仍有问题，必须在 `pikasTech/HWLAB` 修复，不能要求 AgentRun 为 HWLAB 私有 UI 或业务模型增加临时兼容。

跨仓库 issue 和 PR 必须明确写出责任归属、契约依据和验证入口。需要两边配合时，先在拥有公共契约的一侧补齐能力，再在消费侧做最小适配；不允许用双路径、legacy mode、feature flag、fallback 或额外噪声观测长期绕过真实缺口。

直接通过 AgentRun manager、`dispatchHwlabAgentRun()` 或手写 runner job 发起的 canary 只能证明 AgentRun 基础设施和凭据投影本身可用，不能证明 HWLAB Cloud Web/Cloud API 的产品入口已经正确请求这些能力。涉及 Cloud Web Workbench、用户会话、conversation/session/thread、AgentRun runtime assembly 或业务授权的 issue，必须用 HWLAB 的 Web dispatcher 原入口，或调用同一 dispatcher 的 CLI 验证。当前 HWLAB v0.2 到 AgentRun 的资源装配需求权威是 UniDesk OA 的 [Runtime装配](../../project-management/PJ2026-01/specs/PJ2026-010202-runtime-assembly.md) 和 [HWLAB接入](../../project-management/PJ2026-01/specs/PJ2026-010205-hwlab-dispatch.md)：`ResourceBundleRef.kind="gitbundle"` 通过 `bundles[]` 装配 `tools/` 和 `.agents/skills`，旧 `toolAliases` / `skillRefs` / `workspaceFiles` 不再是有效接入口。若消费侧 Web dispatcher 没有按该契约传递 `gitbundle`、tool credential 或 transient env，应归为 HWLAB 接入层问题；若 dispatcher 已正确请求但 AgentRun runner 没有装配，应归为 AgentRun 执行基础设施问题。

HWLAB 与 UniDesk/Artificer 的 `gitbundle` checkout authority 是 repo URL + workspace ref，而不是 cloud-api artifact revision、AipodSpec mirror 开关或运行时 prompt。`ResourceBundleRef` / AipodSpec 必须继续声明无明文凭据的 GitHub repo URL；Git mirror 是 G14/AgentRun 基础设施能力，由 runner 在物化阶段自动把 GitHub URL 改写到受控 mirror read URL。不得在 AipodSpec、Queue task、prompt 或业务 adapter 中声明 `gitMirror`、mirror base URL 或 direct/mirror 分支开关。AgentRun runner 物化后必须记录原始 `repoUrl`、实际 `fetchRepoUrl`、`mirrorUsed`、`mirrorBaseUrl`、requested ref/commit 和 actual `commitId`；devops-infra mirror cache 必须覆盖 Artificer 和 HWLAB 常用 bundle repo，缺 cache 属于基础设施缺口，不能通过让 AipodSpec 直连 GitHub 来绕过。cloud-api、CI/CD 或 rollout 注入的 `commitId` 只可作为 requested hint 或显式 pin 的输入，不得作为默认 materialization 来源。关闭相关 issue 时，证据必须同时显示 `repoUrl`、`requestedRef`、actual `commitId`，以及 `bundles/tools/promptRefs/skillDirs` 摘要；若 actual `commitId` 仍等于旧 cloud-api rollout commit 且不是显式 pin，应继续归为 AgentRun bundle 物化问题。

HWLAB CaseRun 需要专用 skill 时，skill 必须通过 AgentRun `gitbundle` resource bundle 装配给 Code Agent，subject repo 只作为待修改源码来源，不能携带 `.agents/skills` 副本。收口证据应同时包含正向装配和负向隔离：AgentRun trace 或 CaseRun 归档显示 `resource-bundle-materialized`、`resourceBundlePolicy` 和 `.agents/skills/<skill>/SKILL.md` 读取；subject repo diff 或 artifact 中没有新增 `.agents/skills`。若 runner 已按 `gitbundle` 装配但 HWLAB case 仍把 skill 复制进 subject repo，应归为 HWLAB CaseRun 接入层问题；若 HWLAB 已按契约请求而 runner 未物化 skill，则归为 AgentRun bundle 物化问题。

HWLAB Code Agent provider profile 的 `config.toml`、完整 Codex `auth.json` 提交、Secret 证据和真实 profile 试机规则统一见 `docs/reference/hwlab.md#code-agent-provider-profile-配置与验收`。本 AgentRun 参考只维护 AgentRun 仓库、运行面、CI/CD 和跨仓库职责边界，不重复维护 HWLAB profile 凭证语义。

## AgentRun / HWLAB OTel 追踪口径

HWLAB Workbench、Code Agent 或 CaseRun 出现 turn 长时间无新活动、`waitingFor=code-agent`、`idle-after-tool`、provider stream 断开、工具调用后缺 terminal，或用户质疑 AgentRun/codex-stdio 仍在运行但 trace 没追穿时，必须用同一条 OTel trace 同时验证 HWLAB、AgentRun manager 和 AgentRun runner 三层。只看到 HWLAB business trace 或 manager dispatch span 不足以证明已经追到 codex app-server/codex-stdio。

AgentRun runner-side instrumentation 必须从 manager 创建 runner Job 时传递 OTEL endpoint/service env 和稳定 `runnerJobId`，runner 启动后把 `runId`、`commandId`、`runnerJobId`、`runnerId`、`sessionId`、`threadId`、`turnId`、`backendProfile`、`sourceCommit`、`traceId` 与 `otel.trace_id` 写入关键 span。所有 span 与 JSONL 事件都必须保留 `valuesPrinted=false`，敏感配置、prompt、凭据、tool output 和 provider payload 只允许以长度、hash、fingerprint、status、exit code 或枚举原因披露。若声明 `AGENTRUN_LOG_PATH`，runner 必须创建该 JSONL 并写入有界、脱敏的 lifecycle label，不能只在环境变量里声明。

codex app-server backend 的最小可追踪面包括 app-server lifecycle、thread、turn、tool、idle 和 provider stream 断开：`codex_app_server.starting`、`codex_app_server.started`、进程退出时的 `codex_app_server.exit`，以及 `codex_stdio.thread_start.*` / `codex_stdio.thread_resume.*`、`codex_stdio.turn_start.*`、`codex_stdio.tool_call.started|completed|failed`、`codex_stdio.turn_completed`、`codex_stdio.idle_warning`、`codex_stdio.idle_timeout`、`codex_stdio.provider_stream_disconnected` 和 `codex_stdio.missing_terminal_after_tool`。Codex notification 中 `item/started` 且 `status=inProgress`、`running` 或等价状态必须归一化为 `tool_call.started`，不得误报为 completed。

收口验证必须走 HWLAB Web dispatcher 等价原入口或同一 dispatcher 的 CLI，触发一条新 turn 并尽量包含一次只读工具调用；然后用 `bun scripts/cli.ts platform-infra observability trace --target <node> --trace-id <otelTraceId> --grep codex_stdio --full` 查询 Tempo。通过证据应显示 `services` 至少覆盖 `hwlab-cloud-api`、`agentrun-manager` 和 `agentrun-runner`，codex span 中 `runId`、`commandId`、`runnerJobId` 不缺失，`valuesPrinted=false`，且错误 span 与业务终态一致。若已知 trace 的 runner span 在默认摘要里看不到 `runnerJobId`，先用当前 UniDesk CLI 执行 `platform-infra observability trace --grep runnerJobId --full` 区分摘要器遗漏和 instrumentation 缺口；只有 raw/更新后的摘要仍缺字段，或同一 trace 没有 `agentrun-runner` service，才归为 AgentRun runner-side instrumentation 问题。旧 trace 不会因后续 instrumentation 修复自动回填；旧 trace 缺少 codex runner span 时，只能写成“当时未采集到该事件”，不能倒推出 AgentRun 或 codex-stdio 当时没有运行。

## AgentRun / HWLAB 失败归因标准

HWLAB 通过 AgentRun 执行 Code Agent turn 时，失败归因必须以 AgentRun backend adapter 的结构化 failure kind 为准。AgentRun 负责把 provider、thread、runner、bundle 和 command lifecycle 的失败分类成稳定语义；HWLAB 负责原样消费并映射到用户可读分类。不得为了让 UI 或 issue 收口看起来更顺，把 AgentRun/provider 错误改写成 device-pod、gateway、Cloud API endpoint 或前端渲染问题。

Codex thread 连续性只有一个标准路径：已有 `SessionRef.threadId` 时，AgentRun 必须通过 Codex stdio 原生 `thread/resume` 续接，再对同一 app-server session 执行 `turn/start`。当 `thread/resume` 遇到旧 app-server rollout 缺失、返回 `no rollout found for thread id` 或其他 resume 协议错误时，AgentRun 必须输出 `thread-resume-failed` 并终止当前 turn；不得启动替代 `thread/start`、不得回写新的 `threadId`、不得拼接历史 prompt，也不得要求 HWLAB 通过清会话、隐藏错误或重开路径迁就。HWLAB 收到该 failure kind 时，应显示为 AgentRun/Codex thread resume 层错误，不要把它解释成硬件执行通道或 Cloud API 不可达。

Codex app-server/provider 返回 tool-call 参数 JSON 错误时，AgentRun 应输出 `provider-invalid-tool-call`。HWLAB adapter/Web 应映射为 provider/tool-call 层错误，并保留 `providerTrace.failureKind` 与简明 failure message，明确这不是 device-pod、gateway 或 Cloud API endpoint 故障。后续修复应进入 AgentRun provider/backend adapter 或上游 provider 请求构造，不要在 HWLAB 设备侧增加兼容路径。

诊断入口只能补足同一路径上的可见性，不能形成第二套执行路径。用于复现 provider failure 的自测、fake app-server mode 或 debug command 必须调用真实 backend adapter 分类逻辑，并在完成修复后作为自测或 SPEC 合同保留；不得保留并行诊断镜像、独立执行镜像或只服务某个 issue 的替代 runtime。

AgentRun `command-result` / result API 的 `finalResponse` 必须来自当前 command 的最新终态 assistant 输出，不能在长 trace、steer 或多 command 查询后回退到过期响应。发现 result API 与 raw events、trace rows 或 terminal command 序列不一致时，关闭 HWLAB/CaseRun 问题不能只引用 `command-result.finalResponse`；应以 AgentRun terminal status、当前 command id、raw event/trace 中最后 assistant 输出和硬件证据共同判定，并把 stale result 作为 AgentRun 可见性/结果契约问题追踪。

AgentRun result/session 可见性必须把正在运行的目标 command 与后续 steer command 分开判定。排查 active turn 卡顿、恢复或 closeout 时，优先读取目标 command result/session status 中的 `liveness`，用 `liveness.phase` 区分 `waiting-runner`、`waiting-model`、`waiting-tool`、`idle-after-tool`、`transport-disconnected`、`runner-heartbeat-stale` 和 `terminal`；禁止只凭长时间没有新 event、外层超时或 runner 已回连来推断 turn 已恢复或失败。`steerDelivery` 只说明 steer RPC 在 runner/app-server 链路上的 ack、forward 和 backend accept 状态；`steer completed` 不能替代目标 command 终态，也不能作为目标 turn 已继续输出的证据。关闭 HWLAB/CaseRun 问题时，应同时引用目标 command id、目标 result/session 的 `liveness`、raw trace/terminal command 序列和原入口证据；字段需求以 UniDesk OA 的 [AgentRun核心](../../project-management/PJ2026-01/specs/PJ2026-010201-agentrun-core.md) 与 [队列会话](../../project-management/PJ2026-01/specs/PJ2026-010203-queue-session.md) 为准，UniDesk `docs/reference` 只记录跨仓库归因与验收口径。

持久 runner 模式下，`run.status=claimed` 可以表示 runner/session 仍被占用，不等同于当前 command 未完成。判定 HWLAB Workbench 或 CaseRun 的单个 turn 是否完成时，必须以目标 `commandId` 的 result/session terminal status、raw event 中 terminal command 序列和最后 assistant/tool 输出为权威；不要用 run 级 `claimed`、runner 仍存活、Pod 仍在运行或外层 session 持久化状态覆盖 command 级终态。若 command result 已 terminal 而 HWLAB turn/read model 仍显示 running，应先按 HWLAB projection/read model 消费链路排查，不要反向要求 AgentRun 改写持久 runner 语义。

## 中文规则

AgentRun 仓库内容默认中文。AgentRun 长期文档、过程文档、issue 标题与正文、PR 标题与正文、PR 评论、review 说明和交付总结都必须使用中文。代码标识符、API path、命令名、配置键、日志字段、协议字段和不可避免的外部专有名词可以保留英文，但解释性文字必须使用中文。