fix: prepare writable codex home for runner jobs

This commit is contained in:
Codex
2026-05-29 13:27:11 +08:00
parent 25e017ba67
commit da50a34eef
9 changed files with 96 additions and 16 deletions
+3 -1
View File
@@ -27,6 +27,8 @@ Runner 启动参数必须显式包含:
Runner Secret 只能通过 Kubernetes Secret projection、ServiceAccount/RBAC 或受控 Secret API 读取获得。Codex 测试凭据投影规则见 [spec-v01-secret-distribution.md](spec-v01-secret-distribution.md) 和 [spec-v01-backend-codex.md](spec-v01-backend-codex.md)。
Kubernetes Job runner 必须把 credential source 与 runtime home 分开:Secret volume 只读挂在 `/var/run/agentrun/secrets/...``/home/agentrun``emptyDir` 提供可写空间,`CODEX_HOME` 指向 `/home/agentrun/.codex``AGENTRUN_CODEX_SECRET_HOME` 指向只读 projection。runner/backend 在启动 provider 前只复制授权文件,不打印内容。
## Runner 生命周期
标准状态方向:
@@ -99,7 +101,7 @@ Runner 日志必须实时 flush 到文件或 pod logCLI 启动 runner 时必
| 规格项 | 状态 | 说明 |
| --- | --- | --- |
| `agentrun-runner` 服务规格 | 已定义 | 本文为 v0.1 runner 权威。 |
| Kubernetes Job runner | 部分实现 | 已提供 `runner job --dry-run` Job manifest 渲染骨架;正式 `runner job` 通过 manager REST 创建 Kubernetes Job,固定使用 `agentrun-v01-runner` ServiceAccount、manager URL、runId/commandId/attemptId、executionPolicySecretRef 文件投影;真实集群综合联调仍需验收。 |
| Kubernetes Job runner | 部分实现 | 已提供 `runner job --dry-run` Job manifest 渲染骨架;正式 `runner job` 通过 manager REST 创建 Kubernetes Job,固定使用 `agentrun-v01-runner` ServiceAccount、manager URL、runId/commandId/attemptId、executionPolicySecretRef 文件投影和 writable Codex runtime home;真实集群综合联调仍需验收。 |
| host process runner | 部分实现 | `runner start``src/runner/main.ts` 进入同一套 `runOnce`,可通过 manager register/claim/poll/report 执行自测试。 |
| claim/lease/report client | 部分实现 | 已拆出 runner manager API client,覆盖 register、claim、lease heartbeat、poll command、ack、append event 和 terminal statusdurable store 仍待 Postgres adapter 接入。 |
| runner redaction | 已定义/未实现 | 需与 backend adapter 共同实现。 |
+2
View File
@@ -131,6 +131,8 @@ Tekton promotion 可以读取 `deploy/deploy.json` 来 render runtime desired st
- `argocd/agentrun-g14-v01` source 必须指向 `v0.1-gitops:deploy/gitops/g14/runtime-v01`destination 必须是 `agentrun-v01`
- `v0.1` Secret、ServiceAccount、RBAC、PVC、ConfigMap 和 runtime config 必须独立命名或 namespace scope;文档、issue、trace 和 report 只记录 SecretRef 名称与 key,不记录值。
- `agentrun-mgr` 和 runner Job 只能通过 `spec-v01-secret-distribution.md` 定义的 SecretRef 注入 Postgres DSN 和 Codex auth/config 文件;测试 Codex 凭据来自 `~/.codex/auth.json``~/.codex/config.toml` 的 Kubernetes Secret projection,不得从 `deploy/deploy.json`、artifact catalog 或 generated manifest 中读取明文。
- Postgres `DATABASE_URL` Secret 必须使用实际创建的数据库名,v0.1 默认为 `agentrun_v01`;密码或其他 URL credential 必须 URL encode 后写入 DSN。Secret 值不进入 source/GitOpsruntime bootstrap 或 secret-management 流程负责创建与轮换。
- Codex provider Secret 在 GitOps manifest 中只能表现为 SecretRef 和只读 volume projectionrunner Job manifest 还必须包含 writable runtime home,用于把 Secret projection 复制到 `CODEX_HOME` 后运行 Codex。
- `agentrun_dev``agentrun_prod` 不得作为 `v0.1` namespace、Argo destination、Pipeline target 或验收目标。
## 手动和热修边界
@@ -52,9 +52,10 @@
| Kubernetes Secret | `agentrun-v01/agentrun-v01-provider-codex` |
| Secret key | `auth.json`,来自 `~/.codex/auth.json` |
| Secret key | `config.toml`,来自 `~/.codex/config.toml` |
| Projection path | runner/backend 容器用户的 `~/.codex/auth.json``~/.codex/config.toml` |
| Projection path | 只读 Secret projection 挂到 `/var/run/agentrun/secrets/<profile>-<index>/auth.json``config.toml`;该路径只作为 credential source。 |
| Runtime config path | runner 启动时把授权的 Secret projection 复制到 writable `CODEX_HOME`,默认 `/home/agentrun/.codex/auth.json``config.toml`。 |
| Projection mode | 只读,建议 `0400` 或等价最小权限 |
| Runtime env | 如 backend 需要,可设置 `HOME` 或等价 Codex config root 指向投影后的 home;不得 fallback 到节点宿主机 home。 |
| Runtime env | `HOME=/home/agentrun``CODEX_HOME=/home/agentrun/.codex``AGENTRUN_CODEX_SECRET_HOME=<projection path>`;不得 fallback 到节点宿主机 home。 |
Secret 创建和轮换必须通过 Kubernetes 密钥管理完成。`deploy/deploy.json` 只写 SecretRef 名称、key 和 mount intent`v0.1-gitops` rendered manifests 只引用 Secret,不包含 Secret data。
@@ -84,7 +85,8 @@ Run 的 `executionPolicy.secretScope` 只能包含引用,不包含值。示例
- `allowCredentialEcho` 必须固定为 `false`
- `secretRef.namespace` 默认只能是 run 所在 lane namespace 或明确批准的 platform namespace。
- manager 可以保存 `secretRef`,但不得读取 Secret 值后存库。
- runner/backend adapter 获得 Secret 的方式必须来自 Kubernetes env/file projection 或受限 Secret API 读取;Codex 默认使用文件 projection `~/.codex/auth.json``~/.codex/config.toml`,不得通过 run payload、event、CLI 参数或日志传递。
- runner/backend adapter 获得 Secret 的方式必须来自 Kubernetes env/file projection 或受限 Secret API 读取;Codex 默认从只读 Secret projection 复制 `auth.json``config.toml` 到 writable `CODEX_HOME` 后启动 app-server,不得通过 run payload、event、CLI 参数或日志传递。
- Secret projection 不能直接作为 `CODEX_HOME`。Codex app-server 会读取并可能维护默认配置、PATH 或运行态文件;把只读 Secret volume 直接挂到 `CODEX_HOME` 会造成启动期写入失败。v0.1 的固定边界是:Secret volume 只读、`/home/agentrun``emptyDir` 提供可写 runtime home、复制动作只发生在 runner/backend 容器内且不打印文件内容。
- SecretRef 不存在或 RBAC 不允许时,run 必须失败为结构化 `failureKind=secret-unavailable` 或等价错误,不得降级成无凭证重试风暴。
## 分发路径
@@ -102,6 +104,7 @@ Kubernetes Secret
-> created from ~/.codex/auth.json and ~/.codex/config.toml by operator or approved secret-management flow
runner/backend Pod
-> receives Codex auth/config via read-only file projection
-> copies authorized files into writable CODEX_HOME before starting Codex app-server
```
Secret 创建和轮换不由 source branch 自动生成;source branch 只声明需要哪个 SecretRef。后续如果接入 External Secrets、Vault、SealedSecrets 或 SOPS,必须新增或更新本 spec,明确 controller、source of truth、rotation 和 redaction 规则。
@@ -140,7 +143,7 @@ bun scripts/agentrun-cli.ts secrets codex render --dry-run
### T2 Runner credential projection
阅读本文,然后启动一个最小 backend runner dry-run,确认 Pod file projection 能看到 `~/.codex/auth.json``~/.codex/config.toml`,但 event、日志和 CLI 输出只显示 redacted credential source,不显示文件内容。
阅读本文,然后启动一个最小 backend runner dry-run,确认 Pod file projection 挂在 `/var/run/agentrun/secrets/...` 且只读,`/home/agentrun` 是 writable runtime home`CODEX_HOME=/home/agentrun/.codex`runner/backend 会把授权文件复制到 `CODEX_HOME` 后再启动 Codexevent、日志和 CLI 输出只显示 redacted credential source,不显示文件内容。
### T3 Missing secret failure
@@ -151,7 +154,7 @@ bun scripts/agentrun-cli.ts secrets codex render --dry-run
| 规格项 | 状态 | 说明 |
| --- | --- | --- |
| Secret 分发规格 | 已定义 | 本文为 v0.1 provider credential 分发权威。 |
| Kubernetes SecretRef 注入 | 部分实现 | runner Job dry-run 和正式 Job 创建路径已按 run `executionPolicy.secretScope.providerCredentials` 生成 Secret volume projection`CODEX_HOME`;真实 Secret 与 Codex turn 仍需综合联调验收。 |
| Kubernetes SecretRef 注入 | 部分实现 | runner Job dry-run 和正式 Job 创建路径已按 run `executionPolicy.secretScope.providerCredentials` 生成 Secret volume projection、writable runtime home 和 `AGENTRUN_CODEX_SECRET_HOME`;真实 Secret 与 Codex turn 仍需综合联调验收。 |
| Codex Secret dry-run 工具 | 已实现 | `bun scripts/agentrun-cli.ts secrets codex render --dry-run` 只输出 Secret 创建计划、hash 和 redacted manifest 摘要,不执行 apply。 |
| Codex auth/config file projection | 部分实现 | backend readiness 检查 `auth.json`/`config.toml` 可读性,缺失时返回 `secret-unavailable`;self-test 使用临时文件模拟投影。 |
| redaction 最小规则 | 部分实现 | Secret dry-run 工具、event、Job dry-run 输出和 self-test 已验证不打印测试 token;复杂审计仍待后续补齐。 |
+1
View File
@@ -42,6 +42,7 @@
- 真实 Argo CD Application `agentrun-g14-v01` 同步到 `agentrun-v01`
- 真实 `agentrun-v01-postgres` StatefulSet、PVC、Service 和 migration ledger。
- 真实 Kubernetes SecretRef 注入 Postgres DSN 和 Code Agent provider credentialCodex 测试凭据必须来自 `~/.codex/auth.json``~/.codex/config.toml` 的 Kubernetes Secret projection。
- Codex Secret projection 必须先保持只读,再复制到 writable `CODEX_HOME` 后启动 app-server;综合联调不得把只读 Secret volume 直接当作 `CODEX_HOME` 的通过证据。
- 真实 `agentrun-mgr`、runner Job 或受控 runner process、真实 backend adapter。
- 至少一个真实 Code Agent provider turnCodex backend 必须通过 `codex app-server --listen stdio://` 的 JSON-RPC stdio turn 完成,mock、fixture、source-only、dry-run、fake provider、直接 Responses HTTP 或 `codex exec` 一次性输出不能作为通过证据。如果 provider credential SecretRef 缺失,综合联调必须标记 blocked,不能降级为 mock pass。
+35
View File
@@ -1,6 +1,7 @@
import { spawn, type ChildProcessWithoutNullStreams } from "node:child_process";
import { createHash } from "node:crypto";
import { accessSync, constants as fsConstants } from "node:fs";
import { chmod, copyFile, mkdir } from "node:fs/promises";
import path from "node:path";
import * as readline from "node:readline";
import type { BackendEvent, BackendTurnResult, FailureKind, JsonRecord, JsonValue, TerminalStatus } from "../common/types.js";
@@ -256,6 +257,8 @@ export class CodexStdioClient {
export async function runCodexStdioTurn(options: CodexStdioTurnOptions): Promise<BackendTurnResult> {
const codexHome = resolveCodexHome(options);
const projectionFailure = await prepareProjectedCodexHome(codexHome, options.env?.AGENTRUN_CODEX_SECRET_HOME ?? process.env.AGENTRUN_CODEX_SECRET_HOME);
if (projectionFailure) return projectionFailure;
const secretFailure = codexHomeReadiness(codexHome);
if (secretFailure) return secretFailure;
const env = childEnv(options, codexHome);
@@ -347,6 +350,38 @@ export async function runCodexStdioTurn(options: CodexStdioTurnOptions): Promise
return { terminalStatus: terminal.status, failureKind: terminal.failureKind, failureMessage: terminal.message, events: events.map((event) => ({ ...event, payload: redactJson(event.payload) })), ...(threadId ? { threadId } : {}), ...(turnId ? { turnId } : {}) };
}
async function prepareProjectedCodexHome(codexHome: string, projectedHome: string | undefined): Promise<BackendTurnResult | null> {
if (!projectedHome || projectedHome.trim().length === 0) return null;
if (path.resolve(projectedHome) === path.resolve(codexHome)) return null;
try {
await mkdir(codexHome, { recursive: true, mode: 0o700 });
for (const fileName of ["auth.json", "config.toml"]) {
await copyFile(path.join(projectedHome, fileName), path.join(codexHome, fileName));
await chmod(path.join(codexHome, fileName), 0o600);
}
return null;
} catch (error) {
const payload = {
failureKind: "secret-unavailable",
projection: {
source: pathSummary(projectedHome),
destination: pathSummary(codexHome),
valuesPrinted: false,
},
message: error instanceof Error ? redactText(error.message) : "failed to prepare writable Codex home",
} satisfies JsonRecord;
return {
terminalStatus: "blocked",
failureKind: "secret-unavailable",
failureMessage: "Codex Secret projection could not be copied to writable CODEX_HOME",
events: [
{ type: "error", payload },
{ type: "terminal_status", payload: { terminalStatus: "blocked", failureKind: "secret-unavailable" } },
],
};
}
}
function codexHomeReadiness(codexHome: string): BackendTurnResult | null {
const auth = fileReadable(`${codexHome}/auth.json`);
const config = fileReadable(`${codexHome}/config.toml`);
+1 -1
View File
@@ -73,7 +73,7 @@ export async function createKubernetesRunnerJob(options: { store: AgentRunStore;
placement: "kubernetes-job",
logPath: `kubectl -n ${render.namespace} logs job/${render.jobName}`,
},
secretRefs: render.secretRefs.map((item) => ({ profile: item.profile, name: item.secretRef.name, namespace: item.secretRef.namespace ?? render.namespace, keys: item.secretRef.keys ?? [], mountPath: item.mountPath, valuesPrinted: false })),
secretRefs: render.secretRefs.map((item) => ({ profile: item.profile, name: item.secretRef.name, namespace: item.secretRef.namespace ?? render.namespace, keys: item.secretRef.keys ?? [], mountPath: item.runtimeMountPath, projectionPath: item.projectionMountPath, writableCopy: true, valuesPrinted: false })),
pollCommands: {
run: `bun scripts/agentrun-cli.ts runs show ${run.id} --manager-url ${managerUrl}`,
command: `bun scripts/agentrun-cli.ts commands show ${commandId} --run-id ${run.id} --manager-url ${managerUrl}`,
+14 -7
View File
@@ -21,7 +21,8 @@ interface CredentialProjection {
profile: BackendProfile | string;
secretRef: SecretRef;
volumeName: string;
mountPath: string;
runtimeMountPath: string;
projectionMountPath: string;
}
export function renderRunnerJobDryRun(options: RunnerJobRenderOptions): JsonRecord {
@@ -45,7 +46,7 @@ export function renderRunnerJobDryRun(options: RunnerJobRenderOptions): JsonReco
managerUrl: options.managerUrl,
sourceCommit: render.sourceCommit,
},
secretRefs: render.secretRefs.map((item) => ({ profile: item.profile, name: item.secretRef.name, namespace: item.secretRef.namespace ?? render.namespace, keys: item.secretRef.keys ?? [], mountPath: item.mountPath, valuesPrinted: false })),
secretRefs: render.secretRefs.map((item) => ({ profile: item.profile, name: item.secretRef.name, namespace: item.secretRef.namespace ?? render.namespace, keys: item.secretRef.keys ?? [], mountPath: item.runtimeMountPath, projectionPath: item.projectionMountPath, writableCopy: true, valuesPrinted: false })),
pollCommands: {
run: `bun scripts/agentrun-cli.ts runs show ${options.run.id} --manager-url ${options.managerUrl}`,
events: `bun scripts/agentrun-cli.ts runs events ${options.run.id} --manager-url ${options.managerUrl} --after-seq 0 --limit 100`,
@@ -101,7 +102,10 @@ export function renderRunnerJobManifest(options: RunnerJobRenderOptions): { mani
imagePullPolicy: options.imagePullPolicy ?? "IfNotPresent",
command: ["bun", "src/runner/main.ts"],
env,
volumeMounts: secretRefs.map((item) => ({ name: item.volumeName, mountPath: item.mountPath, readOnly: true })),
volumeMounts: [
{ name: "runner-home", mountPath: "/home/agentrun" },
...secretRefs.map((item) => ({ name: item.volumeName, mountPath: item.projectionMountPath, readOnly: true })),
],
resources: {
requests: { cpu: "250m", memory: "512Mi" },
limits: { cpu: "2", memory: "4Gi" },
@@ -113,7 +117,7 @@ export function renderRunnerJobManifest(options: RunnerJobRenderOptions): { mani
},
},
],
volumes: secretRefs.map(secretVolume),
volumes: [{ name: "runner-home", emptyDir: {} }, ...secretRefs.map(secretVolume)],
},
},
},
@@ -122,7 +126,8 @@ export function renderRunnerJobManifest(options: RunnerJobRenderOptions): { mani
}
function runnerEnv(options: RunnerJobRenderOptions, context: { namespace: string; jobName: string; runnerId: string; attemptId: string; sourceCommit: string; secretRefs: CredentialProjection[] }): JsonRecord[] {
const codexMount = context.secretRefs.find((item) => item.profile === "codex")?.mountPath ?? "/home/agentrun/.codex";
const codexSecret = context.secretRefs.find((item) => item.profile === "codex");
const codexHome = codexSecret?.runtimeMountPath ?? "/home/agentrun/.codex";
return [
{ name: "AGENTRUN_MGR_URL", value: options.managerUrl },
{ name: "AGENTRUN_RUN_ID", value: options.run.id },
@@ -136,7 +141,8 @@ function runnerEnv(options: RunnerJobRenderOptions, context: { namespace: string
{ name: "AGENTRUN_K8S_JOB_NAME", value: context.jobName },
{ name: "AGENTRUN_LOG_PATH", value: "/tmp/agentrun-runner.jsonl" },
{ name: "HOME", value: "/home/agentrun" },
{ name: "CODEX_HOME", value: codexMount },
{ name: "CODEX_HOME", value: codexHome },
...(codexSecret ? [{ name: "AGENTRUN_CODEX_SECRET_HOME", value: codexSecret.projectionMountPath }] : []),
];
}
@@ -147,7 +153,8 @@ function credentialProjections(run: RunRecord, namespace: string): CredentialPro
profile: item.profile,
secretRef: item.secretRef.namespace ? item.secretRef : { ...item.secretRef, namespace },
volumeName: sanitizeVolumeName(`${String(item.profile)}-${index}`),
mountPath: normalizeMountPath(item.secretRef.mountPath),
runtimeMountPath: normalizeMountPath(item.secretRef.mountPath),
projectionMountPath: `/var/run/agentrun/secrets/${sanitizeVolumeName(`${String(item.profile)}-${index}`)}`,
}));
}
+23 -1
View File
@@ -5,7 +5,7 @@ import { startManagerServer } from "../../mgr/server.js";
import { MemoryAgentRunStore } from "../../mgr/store.js";
import { ManagerClient } from "../../mgr/client.js";
import { renderRunnerJobDryRun } from "../../runner/k8s-job.js";
import type { RunRecord } from "../../common/types.js";
import type { JsonRecord, RunRecord } from "../../common/types.js";
import { assertNoSecretLeak, createRunWithCommand, type SelfTestCase } from "../harness.js";
const selfTest: SelfTestCase = async (context) => {
@@ -24,6 +24,7 @@ const selfTest: SelfTestCase = async (context) => {
assert.equal(rendered.dryRun, true);
assert.equal(rendered.mutation, false);
assert.equal((rendered.jobIdentity as { serviceAccountName?: string }).serviceAccountName, "agentrun-v01-runner");
assertRunnerJobUsesWritableCodexHome(rendered.manifest as JsonRecord, context.codexHome);
assertNoSecretLeak(rendered);
const fakeKubectl = path.join(context.tmp, "fake-kubectl.js");
@@ -66,3 +67,24 @@ console.log(JSON.stringify({ apiVersion: manifest.apiVersion, kind: manifest.kin
};
export default selfTest;
function assertRunnerJobUsesWritableCodexHome(manifest: JsonRecord, expectedCodexHome: string): void {
const spec = manifest.spec as JsonRecord;
const template = spec.template as JsonRecord;
const podSpec = template.spec as JsonRecord;
const volumes = podSpec.volumes as JsonRecord[];
assert.ok(volumes.some((volume) => volume.name === "runner-home" && typeof volume.emptyDir === "object"), "runner home must be writable emptyDir");
const containers = podSpec.containers as JsonRecord[];
const runner = containers[0] as JsonRecord;
const mounts = runner.volumeMounts as JsonRecord[];
assert.ok(mounts.some((mount) => mount.name === "runner-home" && mount.mountPath === "/home/agentrun"), "runner-home must mount at /home/agentrun");
assert.ok(mounts.some((mount) => mount.name === "codex-0" && mount.mountPath === "/var/run/agentrun/secrets/codex-0" && mount.readOnly === true), "Codex Secret must mount read-only outside CODEX_HOME");
const env = runner.env as JsonRecord[];
const value = (name: string): unknown => env.find((item) => item.name === name)?.value;
assert.equal(value("HOME"), "/home/agentrun");
assert.equal(value("CODEX_HOME"), expectedCodexHome);
assert.equal(value("AGENTRUN_CODEX_SECRET_HOME"), "/var/run/agentrun/secrets/codex-0");
assert.notEqual(value("CODEX_HOME"), value("AGENTRUN_CODEX_SECRET_HOME"));
}
+9 -1
View File
@@ -1,4 +1,5 @@
import assert from "node:assert/strict";
import { access } from "node:fs/promises";
import path from "node:path";
import os from "node:os";
import { startManagerServer } from "../../mgr/server.js";
@@ -25,12 +26,19 @@ const selfTest: SelfTestCase = async (context) => {
const finalCommand = await client.get(`/api/v1/runs/${happy.runId}/commands/${happy.commandId}`) as { state?: string };
assert.equal(finalCommand.state, "completed");
const projectedHome = path.join(context.tmp, "runtime-codex-home");
const projected = await createRunWithCommand(client, { workspace: context.workspace, codexHome: projectedHome }, "hello projected", "selftest-projected-codex-home", 15_000);
const projectedResult = await runOnce({ managerUrl: server.baseUrl, runId: projected.runId, codexCommand: context.fakeCodexCommand, codexArgs: context.fakeCodexArgs, codexHome: projectedHome, env: { CODEX_HOME: projectedHome, AGENTRUN_CODEX_SECRET_HOME: context.codexHome } });
assert.equal(projectedResult.terminalStatus, "completed");
await access(path.join(projectedHome, "auth.json"));
await access(path.join(projectedHome, "config.toml"));
await runFailureCase({ client, managerUrl: server.baseUrl, context, mode: "missing-turn-result", expectedStatus: "failed", expectedFailureKind: "backend-response-invalid" });
await runFailureCase({ client, managerUrl: server.baseUrl, context, mode: "invalid-json", expectedStatus: "failed", expectedFailureKind: "backend-json-parse-error" });
await runFailureCase({ client, managerUrl: server.baseUrl, context, mode: "missing-terminal", expectedStatus: "failed", expectedFailureKind: "backend-timeout", timeoutMs: 500 });
await runSpawnFailureCase({ client, managerUrl: server.baseUrl, context });
return { name: "codex-stdio", tests: ["runner-lease-heartbeat", "codex-stdio-fake-turn", "codex-stdio-missing-turn-result", "codex-stdio-invalid-json", "codex-stdio-timeout", "codex-stdio-spawn-failure"] };
return { name: "codex-stdio", tests: ["runner-lease-heartbeat", "codex-stdio-fake-turn", "codex-stdio-projected-writable-home", "codex-stdio-missing-turn-result", "codex-stdio-invalid-json", "codex-stdio-timeout", "codex-stdio-spawn-failure"] };
} finally {
await new Promise<void>((resolve) => server.server.close(() => resolve()));
}