Files

T

Codex 7ba28c8f40 fix: reload AgentRun DB secrets from YAML

2026-06-13 07:10:32 +00:00

23 KiB

Raw Blame History

AgentRun 开发与运维参考

本文只记录 UniDesk 侧对独立仓库 pikasTech/agentrun 的开发与运维约束。AgentRun 的架构设计、MVP 范围、API 契约、runner/backend 协议和运行时内部规则必须维护在 AgentRun 仓库自身，不能放在 UniDesk 长期参考里作为事实来源。

仓库与 Worktree

AgentRun 唯一长期仓库是：

git@github.com:pikasTech/agentrun.git

AgentRun 当前 v0.1 固定 source worktree 是：

G14:/root/agentrun-v01

该目录必须固定使用 v0.1 分支，origin 必须是 git@github.com:pikasTech/agentrun.git，并保持 clean。任何明确面向 UniDesk/HWLAB 基础 Code Agent 调用服务 v0.1 的开发、文档修改、部署观察或恢复中断后，先通过 UniDesk SSH 透传执行：

trans G14:/root/agentrun-v01 script -- 'pwd; git status --short --branch; git remote -v'

期望状态：

当前路径是 /root/agentrun-v01；
分支是 v0.1...origin/v0.1；
origin 是 git@github.com:pikasTech/agentrun.git；
固定 source worktree clean。

如果固定 source worktree 缺失、dirty、分支不对或 remote 不对，必须先修正，再继续工作。不得把 /root/agentrun 主线历史目录、/root/unidesk、/root/hwlab、D601 workspace、临时 clone、runner checkout、pod 内副本或 master-server 副本当作 AgentRun v0.1 source truth。

Worktree 规则

固定 source worktree 只用于预检、fetch、worktree 管理和最终同步。常规 AgentRun v0.1 功能、文档和部署修改必须使用独立 worktree：

G14:/root/agentrun-v01/.worktree/{pr_branch}

v0.1 worktree 必须从最新 origin/v0.1 创建。任务分支只覆盖当前变更，提交时只提交当前任务相关文件。不要把 /root/agentrun-v01 根目录当作并行任务 scratch 区。

文档落库规则

AgentRun 的 SPEC 和长期参考文档变更不创建 PR。完成本地审查后，必须直接提交并推送到对应目标分支，例如 origin/v0.1。过程计划、阶段证据、验收结果和阻塞点写入对应 GitHub issue 评论区，不能用文档 PR 代替直接落库。

部署目标

AgentRun 废弃旧 dev/prod 运行口径。v0.1 固定部署目标是 G14 原生 k3s namespace：

G14:k3s namespace agentrun-v01

所有 k3s 操作必须使用 UniDesk route 语法：

trans G14:k3s kubectl get pods -n agentrun-v01

不得把临时 NodePort、host port、pod IP、provider-gateway 业务 HTTP proxy 或一次性 port-forward 固化为 AgentRun 部署路径。任何公网入口、UniDesk/HWLAB 集成入口或跨服务访问路径，都必须先通过 AgentRun 仓库内经过审查的变更引入；UniDesk 只在后续记录对应运维入口。

受控 CI/CD 入口

AgentRun 控制面写操作必须通过 UniDesk 高层 CLI 执行。历史 v0.1 G14 lane 仍保留无 --node/--lane 的兼容入口；新增或迁移 lane 必须使用 --node <node> --lane <lane> 从 config/agentrun.yaml 解析目标，不得从 AgentRun service repo 的 deploy.json 读取部署真相。

bun scripts/cli.ts agentrun control-plane status
bun scripts/cli.ts agentrun control-plane trigger-current --dry-run
bun scripts/cli.ts agentrun control-plane trigger-current --confirm
bun scripts/cli.ts agentrun control-plane refresh --dry-run
bun scripts/cli.ts agentrun control-plane refresh --confirm
bun scripts/cli.ts agentrun control-plane cleanup-runs --min-age-minutes 30 --limit 200 --dry-run
bun scripts/cli.ts agentrun control-plane cleanup-runs --min-age-minutes 30 --limit 200 --confirm
bun scripts/cli.ts agentrun control-plane cleanup-released-pvs --limit 200 --dry-run
bun scripts/cli.ts agentrun control-plane cleanup-released-pvs --limit 200 --confirm

YAML-only lane 的标准入口是：

bun scripts/cli.ts agentrun control-plane plan --node D601 --lane v02
bun scripts/cli.ts agentrun control-plane apply --node D601 --lane v02 --dry-run
bun scripts/cli.ts agentrun control-plane apply --node D601 --lane v02 --confirm
bun scripts/cli.ts agentrun control-plane secret-sync --node D601 --lane v02 --dry-run
bun scripts/cli.ts agentrun control-plane secret-sync --node D601 --lane v02 --confirm
bun scripts/cli.ts agentrun control-plane restart --node D601 --lane v02 --dry-run
bun scripts/cli.ts agentrun control-plane restart --node D601 --lane v02 --confirm
bun scripts/cli.ts agentrun control-plane trigger-current --node D601 --lane v02 --dry-run
bun scripts/cli.ts agentrun control-plane trigger-current --node D601 --lane v02 --confirm
bun scripts/cli.ts agentrun control-plane status --node D601 --lane v02 --full

status 只读观察 G14:/root/agentrun-v01 当前 commit、对应 PipelineRun、GitOps latest、Argo Application、agentrun-v01 workload、manager source commit 和 git mirror 摘要，并报告 Argo revision 是否对齐 v0.1-gitops latest。默认输出是 compact commander 视图，只保留 summary、阶段耗时、对齐状态和 drill-down 命令；需要远端 stdout/stderr tail 时显式加 --full，需要原始 git mirror cache 输出时显式加 --raw。status 额外支持 --pipeline-run <name> 与 --source-commit <sha> 定点查询，返回 target、targetValidation 和 next.* drill-down，便于直接判断某次 run 是成功、历史成功、运行中、缺失还是 source mismatch。status 会向 stderr 输出 agentrun.control-plane.status.progress 阶段事件，覆盖 source、runtime 和 git-mirror，避免长时间聚合时无可见进展。trigger-current 会先把固定 source worktree 快进到 origin/v0.1，再以当前 commit 创建 commit-pinned PipelineRun；同名 PipelineRun 正在运行或已经成功时必须拒绝重复触发，只允许在失败态或不存在时创建。该命令只提交 CI/CD 工作，不等待完整 PipelineRun 或 rollout 完成，后续用 status 轮询。refresh 只对 argocd/agentrun-g14-v01 执行 hard refresh，用于 GitOps promotion 已完成但 Argo 仍停留旧 revision 时的受控同步入口；它不直接 patch runtime workload。

YAML-only lane 的 trigger-current 会先确保目标 source workspace/branch 存在，再从 UniDesk YAML 声明的 image build、GitOps branch/path、runtime namespace、Secret、数据库和 manager env 渲染 artifact catalog 与 GitOps desired state。该路径会删除新 lane source branch 中的 deploy/deploy.json，因为部署真相已经迁入 UniDesk YAML；旧 v0.1 branch 中历史文件只作为迁移前遗留产物存在，不能作为新 lane 的事实来源。Secret export 格式或外部数据库连接参数变化时，先用 platform-db postgres export-secrets --confirm 物化本地 Secret source，再用 agentrun control-plane secret-sync --node <node> --lane <lane> --confirm 下发，最后用 agentrun control-plane restart --node <node> --lane <lane> --confirm 让 manager Deployment 通过 rollout 读取新 Secret；不要手工删除 Pod 或直接 patch Secret。

cleanup-runs 是 AgentRun v0.1 完成态 CI workspace retention 入口，只清理 agentrun-ci namespace 中超过 --min-age-minutes 的 agentrun-v01-ci-* PipelineRun，通过 Tekton ownerRef 释放临时 workspace PVC。dry-run 必须披露候选 PipelineRun、owned PVC、active mount 保护、local-path 实际估算 bytes 和 confirm 命令。默认保护最新完成的 PipelineRun，保留当前 CI/CD 状态证据。cleanup-released-pvs 是二次回收入口，只处理 agentrun-ci、local-path、Delete reclaim policy 的 Released PV；它不触碰 agentrun-v01 runtime namespace、业务 PVC、Secret、registry storage 或 GitOps desired state。磁盘治理和 G14 safe-stop 规则见 docs/reference/gc.md。

涉及 AgentRun runner egress、transientEnv 或 Secret 不泄露的 closeout，必须用真实 create/apply/send 资源原语触发 agentrun-v01 runner Job，再通过 describe runnerjob/...、events run/...、logs session/... 或必要的兼容 bridge 检查 runner job response、event/trace 和 Kubernetes Pod spec。通过证据应显示 proxy env 是否存在、NO_PROXY 是否包含 hyueapi.com/.hyueapi.com、短期 HWLAB_API_KEY 等 transientEnv 是否通过 per-job Secret 的 valueFrom.secretKeyRef 注入，以及 response/event 只输出 env name、Secret metadata 和 valuesPrinted=false。不得在 issue、trace 或 Pod spec 摘要中输出 Secret value。AgentRun 内部 SecretRef 合同以 AgentRun 仓库 docs/reference/spec-v01-secret-distribution.md 和 docs/reference/spec-v01-runtime-assembly.md 为权威；UniDesk 只记录验证入口和跨仓库归因。

通过 g14-provider-egress-proxy.unidesk.svc.cluster.local:18789 验证 codeload.github.com 时，必须同时确认 G14 runtime egress Service 有 ready endpoint。Service/DNS 存在但 Deployment 0/1、Endpoint 只有 notReady address、Pod ImagePullBackOff 或 ContainerStatusUnknown 时，问题归为 UniDesk/G14 runtime egress 基础设施；不能把 runner 已注入 proxy env 后的 connect refused 归为 AgentRun 业务修复失败，也不能关闭要求“通过受控 proxy 成功访问 codeload”的 issue。

UniDesk 边界

UniDesk 是 AgentRun 的综合分布式开发和运维中心。UniDesk 可以记录：

AgentRun 的固定仓库、source worktree 和 worktree 规则；
G14 预检、route 语法和远程操作入口；
v0.1 固定 namespace 与后续版本 lane 规则；
部署观察、受控 rollout 和运维入口；
AgentRun 仓库定义公共契约后，UniDesk 与 HWLAB 如何接入。

UniDesk 不能作为以下内容的事实来源：

AgentRun 服务架构；
MVP 阶段规划；
RESTful API 契约；
runner/backend 协议；
数据库 schema；
tenant policy 模型；
backend adapter 设计。

这些内容必须维护在 AgentRun 仓库自己的 AGENTS.md 和 docs/reference/ 中。

AgentRun Queue 与旧 Code Queue 边界

AgentRun v0.1 的指挥官任务面已经按 AgentRun issue #105 完成真实运行面验收，可作为新任务派发、commander queue 观察、events/logs/result、steer/send、ack 和 cancel 的 AgentRun 侧标准路径。长期使用时仍以 AgentRun 仓库自身 SPEC 为能力事实来源；UniDesk 只记录该路径已经通过 G14 agentrun-v01 运行面和 hy profile + gpt-5.5 验证。

UniDesk 指挥官新任务入口固定使用 bun scripts/cli.ts agentrun get|describe|events|logs|result|ack|cancel|dispatch|create|apply|steer|send 资源原语。该入口是 render-only client：UniDesk 客户端保留 k8s 风格命令解析、human 表格、生命周期摘要、下一步命令、分页、-o json|yaml 稳定客户端 schema 和错误展示；AgentRun 服务端只提供稳定 RESTful API、鉴权和业务事实，不承载 UniDesk CLI 渲染。日常派单优先用 agentrun create task --aipod Artificer --prompt-stdin 或 agentrun apply -f - 的 quoted YAML/JSON heredoc/stdin 形式；已创建未运行任务用 agentrun dispatch task/<taskId> 派发；--json-file、--prompt-file 和 --runner-json-file 只是客户端输入来源，用于已审阅且可复用的受控文件。UniDesk 不实现 AgentRun queue 协议，也不把任务 double-write 回旧 Code Queue。

资源原语和旧兼容 group 的默认 transport 是直连 AgentRun REST API，配置来源是 UniDesk 自有 YAML config/agentrun.yaml。鉴权可以复用 HWLAB_API_KEY 的环境变量/固定文件发现风格，但不得依赖 HWLAB runtime、HWLAB backend-core、HWLAB frontend 代理或 SSH official CLI；多一层转发会增加故障面，不能作为正式路径。--raw 只披露直连 AgentRun REST envelope 和必要的 transport=direct-http、clientRole=render-only、configPath、baseUrl、auth source/redacted metadata，不打印 token value。agentrun control-plane ... 和 git-mirror ... 仍属于 G14 source/runtime 运维控制路径，可以继续使用 UniDesk SSH capture bridge；这些控制面路径不得反向成为 queue/session 资源原语的默认 transport。

AgentRun 公网 HTTPS 入口、FRP/Caddy edge、direct REST base URL 和鉴权来源都由 UniDesk config/agentrun.yaml 声明。YAML-only lane 不允许把这些部署选择写回 AgentRun source branch 的 deploy/deploy.json；AgentRun source repo 只保留应用代码、构建输入和 AgentRun 自身契约。bun scripts/cli.ts agentrun control-plane expose --confirm 只负责按 UniDesk YAML 补 edge 侧 allow port 与 Caddy site，不在 AgentRun k3s 中创建 Ingress、NodePort、LoadBalancer、hostPort 或 HWLAB 转发层。

AgentRun Queue 任务如果需要调用 UniDesk 维护桥，例如 trans / unidesk-ssh，长期契约以 AgentRun 仓库 docs/reference/spec-v01-runtime-assembly.md 和 docs/reference/spec-v01-secret-distribution.md 为准：调用方通过 executionPolicy.secretScope.toolCredentials[].tool=unidesk-ssh 请求 UNIDESK_SSH_CLIENT_TOKEN SecretRef；非敏感 endpoint 由 runner-job transientEnv 显式提供，或由 manager 受控默认值自动补齐。UniDesk bridge 提交 Queue payload 时不得在 prompt、payload 或 transientEnv 中携带 token，也不得使用 HWLAB runtime Web 入口冒充 UniDesk frontend。若 dispatcher 已正确请求 unidesk-ssh 但 trace 的 runner-job-created.transientEnv.names 没有 UNIDESK_MAIN_SERVER_IP、UNIDESK_MAIN_SERVER_HOST 或 UNIDESK_FRONTEND_URL，归为 AgentRun assembly 问题；若 endpoint env 已存在但 route denied/timeout，再按 UniDesk frontend/token scope 或 provider session 排查。

旧 UniDesk Code Queue 只保留历史归档、只读排障和残留旧任务停止入口。codex submit/enqueue、codex steer、codex resume、codex queue create/merge、codex move、旧 Web 提交表单、旧队列管理和旧 workdir 管理都必须返回冻结状态或禁用；codex task/tasks/output/read/unread/queues 可继续读取历史，codex interrupt|cancel 只用于停止残留旧任务。旧 Code Queue history 不迁移到 AgentRun，也不提供 adapter、legacy mode、fallback 或双写路径。

AgentRun / HWLAB 协同职责边界

HWLAB 接入 AgentRun 时，必须先按公共契约和运行证据判断问题归属，再进入对应仓库修改。谁拥有缺失能力、错误语义或未修复行为，就改谁；不得为了让当前联调继续推进而在另一侧迁就、伪造语义、补观测替代实现，或把缺失能力包装成已完成。

AgentRun 负责共享 Agent 执行基础设施本身，包括 run/command/runner-job 生命周期、bundle 物化、cancel、trace/result 元语、backend adapter 事件语义、runner 环境传递、CLI 结果查询和 SPEC 中已经承诺的能力。若这些能力缺失或行为错误，必须在 pikasTech/agentrun 的 SPEC、源码、单元/自测、CI/CD 和 agentrun-v01 运行面中补齐，再让 HWLAB 通过 adapter 消费明确契约；HWLAB 不应在渲染层、adapter 层或 prompt 中推断、补造 AgentRun 没有发出的事实。

HWLAB 负责自身产品和接入层，包括用户鉴权、Cloud Web/CLI 对外 API、conversation/session 归属、前端展示、device-pod 业务授权、HWLAB 到 AgentRun 的 adapter 映射，以及不改变外部 API 的内部调用切换。若 AgentRun 已按契约输出正确语义，而 HWLAB 消费、映射、渲染或业务路径仍有问题，必须在 pikasTech/HWLAB 修复，不能要求 AgentRun 为 HWLAB 私有 UI 或业务模型增加临时兼容。

跨仓库 issue 和 PR 必须明确写出责任归属、契约依据和验证入口。需要两边配合时，先在拥有公共契约的一侧补齐能力，再在消费侧做最小适配；不允许用双路径、legacy mode、feature flag、fallback 或额外噪声观测长期绕过真实缺口。

直接通过 AgentRun manager、dispatchHwlabAgentRun() 或手写 runner job 发起的 canary 只能证明 AgentRun 基础设施和凭据投影本身可用，不能证明 HWLAB Cloud Web/Cloud API 的产品入口已经正确请求这些能力。涉及 Cloud Web Workbench、用户会话、conversation/session/thread、AgentRun runtime assembly 或业务授权的 issue，必须用 HWLAB 的 Web dispatcher 原入口，或调用同一 dispatcher 的 CLI 验证。当前 HWLAB v0.2 到 AgentRun 的资源装配权威是 HWLAB docs/reference/agentrun-code-agent-dispatch.md 和 AgentRun docs/reference/spec-v01-runtime-assembly.md：ResourceBundleRef.kind="gitbundle" 通过 bundles[] 装配 tools/ 和 .agents/skills，旧 toolAliases / skillRefs / workspaceFiles 不再是有效接入口。若消费侧 Web dispatcher 没有按该契约传递 gitbundle、tool credential 或 transient env，应归为 HWLAB 接入层问题；若 dispatcher 已正确请求但 AgentRun runner 没有装配，应归为 AgentRun 执行基础设施问题。

HWLAB 与 UniDesk/Artificer 的 gitbundle checkout authority 是 repo URL + workspace ref，而不是 cloud-api artifact revision、AipodSpec mirror 开关或运行时 prompt。ResourceBundleRef / AipodSpec 必须继续声明无明文凭据的 GitHub repo URL；Git mirror 是 G14/AgentRun 基础设施能力，由 runner 在物化阶段自动把 GitHub URL 改写到受控 mirror read URL。不得在 AipodSpec、Queue task、prompt 或业务 adapter 中声明 gitMirror、mirror base URL 或 direct/mirror 分支开关。AgentRun runner 物化后必须记录原始 repoUrl、实际 fetchRepoUrl、mirrorUsed、mirrorBaseUrl、requested ref/commit 和 actual commitId；devops-infra mirror cache 必须覆盖 Artificer 和 HWLAB 常用 bundle repo，缺 cache 属于基础设施缺口，不能通过让 AipodSpec 直连 GitHub 来绕过。cloud-api、CI/CD 或 rollout 注入的 commitId 只可作为 requested hint 或显式 pin 的输入，不得作为默认 materialization 来源。关闭相关 issue 时，证据必须同时显示 repoUrl、requestedRef、actual commitId，以及 bundles/tools/promptRefs/skillDirs 摘要；若 actual commitId 仍等于旧 cloud-api rollout commit 且不是显式 pin，应继续归为 AgentRun bundle 物化问题。

HWLAB CaseRun 需要专用 skill 时，skill 必须通过 AgentRun gitbundle resource bundle 装配给 Code Agent，subject repo 只作为待修改源码来源，不能携带 .agents/skills 副本。收口证据应同时包含正向装配和负向隔离：AgentRun trace 或 CaseRun 归档显示 resource-bundle-materialized、resourceBundlePolicy 和 .agents/skills/<skill>/SKILL.md 读取；subject repo diff 或 artifact 中没有新增 .agents/skills。若 runner 已按 gitbundle 装配但 HWLAB case 仍把 skill 复制进 subject repo，应归为 HWLAB CaseRun 接入层问题；若 HWLAB 已按契约请求而 runner 未物化 skill，则归为 AgentRun bundle 物化问题。

HWLAB Code Agent provider profile 的 config.toml、完整 Codex auth.json 提交、Secret 证据和真实 profile 试机规则统一见 docs/reference/hwlab.md#code-agent-provider-profile-配置与验收。本 AgentRun 参考只维护 AgentRun 仓库、运行面、CI/CD 和跨仓库职责边界，不重复维护 HWLAB profile 凭证语义。

AgentRun / HWLAB 失败归因标准

HWLAB 通过 AgentRun 执行 Code Agent turn 时，失败归因必须以 AgentRun backend adapter 的结构化 failure kind 为准。AgentRun 负责把 provider、thread、runner、bundle 和 command lifecycle 的失败分类成稳定语义；HWLAB 负责原样消费并映射到用户可读分类。不得为了让 UI 或 issue 收口看起来更顺，把 AgentRun/provider 错误改写成 device-pod、gateway、Cloud API endpoint 或前端渲染问题。

Codex thread 连续性只有一个标准路径：已有 SessionRef.threadId 时，AgentRun 必须通过 Codex stdio 原生 thread/resume 续接，再对同一 app-server session 执行 turn/start。当 thread/resume 遇到旧 app-server rollout 缺失、返回 no rollout found for thread id 或其他 resume 协议错误时，AgentRun 必须输出 thread-resume-failed 并终止当前 turn；不得启动替代 thread/start、不得回写新的 threadId、不得拼接历史 prompt，也不得要求 HWLAB 通过清会话、隐藏错误或重开路径迁就。HWLAB 收到该 failure kind 时，应显示为 AgentRun/Codex thread resume 层错误，不要把它解释成硬件执行通道或 Cloud API 不可达。

Codex app-server/provider 返回 tool-call 参数 JSON 错误时，AgentRun 应输出 provider-invalid-tool-call。HWLAB adapter/Web 应映射为 provider/tool-call 层错误，并保留 providerTrace.failureKind 与简明 failure message，明确这不是 device-pod、gateway 或 Cloud API endpoint 故障。后续修复应进入 AgentRun provider/backend adapter 或上游 provider 请求构造，不要在 HWLAB 设备侧增加兼容路径。

诊断入口只能补足同一路径上的可见性，不能形成第二套执行路径。用于复现 provider failure 的自测、fake app-server mode 或 debug command 必须调用真实 backend adapter 分类逻辑，并在完成修复后作为自测或 SPEC 合同保留；不得保留并行诊断镜像、独立执行镜像或只服务某个 issue 的替代 runtime。

AgentRun command-result / result API 的 finalResponse 必须来自当前 command 的最新终态 assistant 输出，不能在长 trace、steer 或多 command 查询后回退到过期响应。发现 result API 与 raw events、trace rows 或 terminal command 序列不一致时，关闭 HWLAB/CaseRun 问题不能只引用 command-result.finalResponse；应以 AgentRun terminal status、当前 command id、raw event/trace 中最后 assistant 输出和硬件证据共同判定，并把 stale result 作为 AgentRun 可见性/结果契约问题追踪。

AgentRun result/session 可见性必须把正在运行的目标 command 与后续 steer command 分开判定。排查 active turn 卡顿、恢复或 closeout 时，优先读取目标 command result/session status 中的 liveness，用 liveness.phase 区分 waiting-runner、waiting-model、waiting-tool、idle-after-tool、transport-disconnected、runner-heartbeat-stale 和 terminal；禁止只凭长时间没有新 event、外层超时或 runner 已回连来推断 turn 已恢复或失败。steerDelivery 只说明 steer RPC 在 runner/app-server 链路上的 ack、forward 和 backend accept 状态；steer completed 不能替代目标 command 终态，也不能作为目标 turn 已继续输出的证据。关闭 HWLAB/CaseRun 问题时，应同时引用目标 command id、目标 result/session 的 liveness、raw trace/terminal command 序列和原入口证据；字段契约以 AgentRun 仓库 v0.1 spec 为准，UniDesk 只记录跨仓库归因与验收口径。

中文规则

AgentRun 仓库内容默认中文。AgentRun 长期文档、过程文档、issue 标题与正文、PR 标题与正文、PR 评论、review 说明和交付总结都必须使用中文。代码标识符、API path、命令名、配置键、日志字段、协议字段和不可避免的外部专有名词可以保留英文，但解释性文字必须使用中文。

23 KiB Raw Blame History Unescape Escape