Commit Graph

349 Commits

Author SHA1 Message Date
Lyon a3422b4c76 Merge pull request #87 from pikasTech/fix-v01-runner-claim-recovery
修复替换 runner 等待旧 lease 后接管
2026-06-04 00:07:23 +08:00
Codex 0d040a33c2 fix: wait for stale runner lease before replacement claim 2026-06-03 23:49:40 +08:00
Codex 0dfe709fd4 feat(v0.1): CLI runner job --dry-run 也查 session 加 sessionPvc
之前 CLI runner job --dry-run 直接调 renderRunnerJobDryRun 不经 mgr,
所以 kubernetes-runner-job.ts 里的 sessionPvc 查找逻辑被绕过,
dry-run manifest 不含 agentrun-sessions volume。

修复:dry-run 路径先 GET /api/v1/sessions 查 storageKind=pvc + storagePvcName,
自己构造 sessionPvc 传给 renderRunnerJobDryRun,dry-run 输出
与 mgr 真实创建 runner Job 的 manifest 一致。
2026-06-03 21:16:33 +08:00
Codex 78513aa4c7 feat(v0.1): CLI sessions create / storage / storage-delete + session-turn auto-ensure PVC
PR C 收尾附带 CLI 能力:
- 新增 sessions create [sessionId] 调 POST /api/v1/sessions 创建 session+PVC
- 新增 sessions storage <sessionId> 调 GET /api/v1/sessions/:id/storage
- 新增 sessions storage <sessionId> --delete 调 DELETE
- sessions turn <sessionId> 也会先 GET storage 探活,不存在则 POST /api/v1/sessions 补建
  (之前 sessions turn 只在 store 里隐式建 session record 但 storageKind=none,
  现在用显式 session create 入口保证 storageKind=pvc 提前建好)
- ManagerClient 新增 delete() 方法
2026-06-03 20:56:27 +08:00
Codex 7ccea67391 feat(v0.1): codex-stdio emit codex-rollout-storage-mounted + session-store-evicted upgrade
PR C 收尾:codex-stdio.ts 加 observability + new failureKind 升级路径

- 启动时读 env(不是 process.env)发出 codex-rollout-storage-mounted 事件:
  pvcName / pvcNamespace / mountPath / codexRolloutSubdir / valuesPrinted=false
- thread/resume 失败 + 'no rollout found for thread id' 消息 + AGENTRUN_SESSION_PVC_NAME
  已设 → 升级为 session-store-evicted,区别于 thread-resume-failed
- isNoRolloutFoundMessage helper 隔离匹配逻辑
- 4 新 selftest case:
  codex-stdio-session-storage-mounted(事件存在 + 字段对齐)
  codex-stdio-session-storage-evicted(failureKind 升级)
  codex-stdio-session-storage-subdir(AGENTRUN_CODEX_ROLLOUT_SUBDIR 配置生效)
  codex-stdio-session-storage-no-secret-leak(事件不泄露)

PR C 全部完成:runner Job 直接挂载 PVC + codex-stdio observability +
session-store-evicted 升级 + 5 新 selftest(1 runner + 4 codex)
2026-06-03 20:38:11 +08:00
Codex f08a4e75cd feat(v0.1): runner Job 直接挂载 per-session PVC + env 透传
PR C 起步:k8s-job.ts 加 sessionPvc volume + env passthrough

- src/runner/k8s-job.ts: 新 RunnerSessionPvcOptions 接口;manifest 多渲染
  agentrun-sessions volume + volumeMount;env 多透传 AGENTRUN_SESSION_PVC_NAME /
  _NAMESPACE / _MOUNT_PATH / AGENTRUN_CODEX_ROLLOUT_SUBDIR
- src/mgr/kubernetes-runner-job.ts: run 引用 session 时查 session storage
  kind=pvc 自动构造 sessionPvc 透传给 manifest 渲染;kind=evicted 已在
  PR B 短路返回 session-store-evicted
- selftest: 1 新 case runner-k8s-job-session-pvc-volume-and-env 验证 PVC volume
  + env 全套透传

后续 PR C 剩余:src/backend/codex-stdio.ts emit codex-rollout-storage-mounted
事件 + session-store-evicted 升级;3 个 codex-stdio 端到端 case。
2026-06-03 20:21:03 +08:00
Codex 4793ca154a fix(v0.1): recover session PVC when prior create left session without storage
之前的失败用例会让 session 留在 storageKind=none 状态但 pvcName 缺失,
现在 POST /api/v1/sessions 在 storageKind=none || !storagePvcName 时
重新调 createSessionPvc 补建,action=session-storage-recovered。

selftest 覆盖:显式 reset storageKind=none 后第二次 POST 走 recovery。
2026-06-03 19:59:49 +08:00
Codex da797c907c fix(v0.1): sanitize session id for PVC name (RFC 1123 subdomain compliance)
session id 允许任意字符(含下划线/大写/点),但 PVC name 必须符合
RFC 1123 subdomain(小写字母数字 + '-' + '.',首尾必须 alphanumeric)。

sanitizeSessionIdForPvc 把非法字符替换为 '-',全空 fallback 到 'default'。
selftest 增加 3 case 覆盖下划线/大写/纯符号。
2026-06-03 19:47:29 +08:00
Codex e8cfa4c692 feat(v0.1): add mgr session PVC lifecycle for true session state persistence
PR B for #770: mgr/session-pvc.ts + server endpoints + selftest.

- 新模块 src/mgr/session-pvc.ts: createSessionPvc / getSessionPvcSummary / deleteSessionPvc / refreshSessionPvcSummary / runSessionStorageGc / startSessionStorageGcLoop
- Server 增量 4 个 endpoint:
  * POST /api/v1/sessions: 创建 session 同步创建 PVC
  * GET /api/v1/sessions/:id/storage: 查询 PVC 摘要
  * DELETE /api/v1/sessions/:id/storage: 删 PVC + storage_kind=evicted
  * POST /api/v1/sessions/:id/storage/refresh: runner 上报 PVC 摘要
  * POST /api/v1/sessions/storage/gc: 手动触发 GC
- mgr SA RBAC 已在 PR A 增加;manager server 不直连 Kubernetes API(kubectl 由 mgr 容器内执行)
- SessionRecord 增量 storageKind / storagePvcName / storageNamespace / storageSizeBytes / storageFilesCount / storageSha256 / storageUpdatedAt / storagePvcPhase / storageEvictedAt / codexRolloutSubdir
- kubernetes-runner-job 短路:run 引用 evicted session 时直接返回 session-store-evicted,不创建 runner Job
- KubectlHandler 可注入,selftest 覆盖 create / summary / refresh / eviction / gc / REST 路径
- GC loop 默认 5min(AGENTRUN_SESSION_GC_INTERVAL_MS 可调)

runner / backend / HWLAB adapter 在 PR C / PR D 落地。
2026-06-03 19:19:09 +08:00
Codex 87beb00bdb feat(v0.1): add per-session RWO PVC foundation for true session state persistence
PR A for #770: docs + migration 007 + RBAC + types foundation.

- 新增 failureKind session-store-evicted,用于区分 PVC 缺失与真协议错误
- 新增 migration 007_v01_session_state_storage:sessions 表增加 storage_* 列 + 索引
- mgr SA RBAC 增量:persistentvolumeclaims: [create, get, list, watch, delete]
- 6 份 SPEC 升级(runtime-assembly / hwlab-manual-dispatch / backend-codex T7b / agentrun-runner / agentrun-mgr / services)
- 显式禁止:fake app-server mock、replacement threadId、runner 启动后 copy/restore、idleTimeoutMs 拉永驻
- selftest 断言更新到 007_v01_session_state_storage

后续 PR B/C 在此基础上接入 mgr 端 PVC 生命周期 + runner 端 mount + backend 端 observability。
2026-06-03 18:45:13 +08:00
Lyon cb93992b1c Merge pull request #85 from pikasTech/feat/issue84-session-subagent-cli
feat(v0.1): 增加异步 subagent Session CLI 控制面
2026-06-03 11:29:38 +08:00
Codex b761ef6713 feat: add session subagent cli control 2026-06-03 11:27:55 +08:00
Lyon a40fdf6ab1 Merge pull request #83 from pikasTech/fix/issue82-runner-ripgrep
fix: AgentRun runner 镜像补齐 ripgrep
2026-06-03 08:03:57 +08:00
Codex 25bc93b371 fix: add ripgrep to runner image 2026-06-03 07:59:13 +08:00
Lyon 3223b297ca Merge pull request #81 from pikasTech/fix/issue723-no-total-turn-timeout
fix: keep Codex sessions reusable after turn failures
2026-06-03 00:30:47 +08:00
Codex f69ef55ddc fix: keep codex sessions reusable after turn failures 2026-06-03 00:28:41 +08:00
Lyon c51180eef6 Merge pull request #80 from pikasTech/fix/v01-assistant-delta-progress-712
修复长任务中间过程 trace 可见性
2026-06-02 21:19:01 +08:00
Codex ce031238f1 fix: 保留长任务过程 trace 事件 2026-06-02 21:17:56 +08:00
Codex 3018b8a937 feat: assemble resource prompts and skills 2026-06-02 20:40:14 +08:00
Codex a53f5b8a0d docs: specify resource prompt and skill assembly 2026-06-02 19:59:49 +08:00
Lyon 5089c5e31a Merge pull request #78 from pikasTech/fix/issue701-native-codex-stdio-resume
fix: fail stale Codex thread resume
2026-06-02 16:23:40 +08:00
Codex e9843ab687 fix: fail stale codex thread resume 2026-06-02 16:22:27 +08:00
Lyon d49b958649 Merge pull request #77 from pikasTech/feat-unidesk-ssh-tool-193
feat: 装配 UniDesk SSH 工具凭证
2026-06-02 15:56:49 +08:00
Codex 458d814fa2 feat: 装配 UniDesk SSH 工具凭证 2026-06-02 15:40:48 +08:00
Lyon 16b32af9b5 Merge pull request #76 from pikasTech/fix/v01-thread-resume-replacement
fix: replace stale codex threads
2026-06-02 12:37:24 +08:00
Codex e40585fd66 fix: replace stale codex threads 2026-06-02 12:36:34 +08:00
Lyon 7475ea0116 Merge pull request #74 from pikasTech/fix/v01-issue-691
fix: 收敛 stale thread 和 tool-call 错误归因
2026-06-02 12:10:50 +08:00
Codex 40a274d52b fix: 收敛 stale thread 和 tool-call 错误归因 2026-06-02 12:08:38 +08:00
Lyon 0092f55249 fix: keep suppressed notification names readable (#73)
Co-authored-by: Codex <codex@pikas.tech>
2026-06-02 11:16:50 +08:00
Lyon aa0bd64714 fix: 压制 agentMessage lifecycle 噪声 (#72)
Co-authored-by: Codex <codex@pikas.tech>
2026-06-02 10:59:59 +08:00
Lyon e1c0fb5245 fix: 恢复 stale codex thread (#71)
Co-authored-by: Codex <codex@pikas.tech>
2026-06-02 10:48:28 +08:00
Codex 98f6e420e6 docs: 统一 v0.1 artifact catalog 真相源 2026-06-02 10:39:59 +08:00
Lyon ebc5bdb8b1 fix: 收敛 commandExecution toolcall 摘要 (#70)
Co-authored-by: Codex <codex@pikas.tech>
2026-06-02 10:28:35 +08:00
Lyon 5efa0cfa31 Merge pull request #69 from pikasTech/fix/v01-requested-thread-resume-684
fix: 统一 AgentRun threadId 连续性
2026-06-02 10:21:32 +08:00
Codex 2cce4c2777 fix: 统一 AgentRun threadId 连续性 2026-06-02 10:16:03 +08:00
Lyon a6f7581b96 fix: 继续收敛 codex trace 残余噪声 (#68)
Co-authored-by: Codex <codex@pikas.tech>
2026-06-02 10:11:31 +08:00
Lyon 2c2524880e Merge pull request #67 from pikasTech/fix/v01-steer-685
feat: 支持运行中 steer command
2026-06-02 10:05:39 +08:00
Codex d90e01a91c feat: 支持运行中 steer command 2026-06-02 10:04:36 +08:00
Lyon 1d45d272f1 fix: 收敛 codex stdio trace 噪声 (#66)
Co-authored-by: Codex <codex@pikas.tech>
2026-06-02 09:49:55 +08:00
Lyon 5db48b299e Merge pull request #64 from pikasTech/fix/v01-resource-alias-default-path
fix: expose resource aliases on runner path
2026-06-02 09:13:42 +08:00
Codex 10bc33f8e1 fix: expose resource aliases on runner path 2026-06-02 09:12:47 +08:00
Lyon 05ef49d0ce Merge pull request #63 from pikasTech/fix/v01-resource-tool-alias
feat: 装配 resource bundle tool alias
2026-06-02 08:50:57 +08:00
Codex 9700d0600f feat: assemble resource bundle tool aliases 2026-06-02 08:50:21 +08:00
Lyon 83ab1df593 Merge pull request #62 from pikasTech/fix/v01-minimax-m3-responses-wire
fix: use responses wire api for minimax m3
2026-06-02 08:28:38 +08:00
Codex 0918736e0a fix: use responses wire api for minimax m3 2026-06-02 08:27:53 +08:00
Lyon 3ed57a06c0 Merge pull request #61 from pikasTech/fix/v01-transient-env-unbounded
fix: 放开 transientEnv 数量限制
2026-06-02 08:25:09 +08:00
Codex 82e2349030 fix: 放开 transientEnv 数量限制 2026-06-02 08:24:22 +08:00
Lyon c1f22210a4 Merge pull request #60 from pikasTech/fix/v01-minimax-m3-migration
fix: isolate minimax m3 backend migration
2026-06-02 08:12:08 +08:00
Codex 4b9dc79b67 fix: isolate minimax m3 backend migration 2026-06-02 08:11:15 +08:00
Lyon ab40d86dde Merge pull request #59 from pikasTech/fix/v01-agent-message-live
fix: 实时上报 agentMessage
2026-06-02 08:05:55 +08:00