From f775990c906fcfed835a47091877ec3953c2d425 Mon Sep 17 00:00:00 2001 From: Codex Date: Sun, 17 May 2026 09:48:00 +0000 Subject: [PATCH] feat: add master code queue manager --- AGENTS.md | 12 +- TEST.md | 2 +- config.json | 52 +- deploy.json | 5 + docker-compose.yml | 30 + docs/reference/cli.md | 14 +- docs/reference/deployment.md | 17 +- docs/reference/frontend.md | 2 +- docs/reference/microservices.md | 30 +- docs/reference/repo-tree.md | 3 +- scripts/cli.ts | 4 +- scripts/src/check.ts | 3 + scripts/src/config.ts | 6 +- scripts/src/docker.ts | 11 +- src/components/backend-core/src/config.ts | 4 +- .../backend-core/src/microservice-proxy.ts | 16 + src/components/backend-core/src/types.ts | 2 +- .../microservices/code-queue-mgr/Dockerfile | 12 + .../microservices/code-queue-mgr/bun.lock | 34 + .../microservices/code-queue-mgr/package.json | 17 + .../microservices/code-queue-mgr/src/index.ts | 2157 +++++++++++++++++ .../code-queue-mgr/tsconfig.json | 18 + 22 files changed, 2406 insertions(+), 45 deletions(-) create mode 100644 src/components/microservices/code-queue-mgr/Dockerfile create mode 100644 src/components/microservices/code-queue-mgr/bun.lock create mode 100644 src/components/microservices/code-queue-mgr/package.json create mode 100644 src/components/microservices/code-queue-mgr/src/index.ts create mode 100644 src/components/microservices/code-queue-mgr/tsconfig.json diff --git a/AGENTS.md b/AGENTS.md index ee045830..f2de636e 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -26,17 +26,17 @@ UniDesk 是一个以主 server 为统一入口的分布式工作平台;本文 - `bun scripts/cli.ts --main-server-ip `:默认通过公网 frontend 登录态远程执行调试、用户服务(底层命令名 `microservice`)、Code Queue 查询与节点自测命令,不要求主 server SSH key,详细规范见 `docs/reference/cli.md`。 - `bun scripts/cli.ts config show`:校验并展示根目录 `config.json`,配置来源规则见 `docs/reference/config.md`。 - `bun scripts/cli.ts check [--full|--files|--scripts-typecheck|--components|--compose|--logs]`:默认只运行轻量配置和 TypeScript 语法检查;关键文件、`scripts/` 类型、组件类型、Docker Compose 和日志策略检查需显式开启,测试入口见 `TEST.md`。 -- `bun scripts/cli.ts server start`:以异步 job 启动 database、backend-core、frontend、provider-gateway 和主 server 用户服务,部署规则见 `docs/reference/deployment.md`。 +- `bun scripts/cli.ts server start`:以异步 job 启动 database、backend-core、frontend、provider-gateway、code-queue-mgr 和主 server 用户服务,部署规则见 `docs/reference/deployment.md`。 - `bun scripts/cli.ts server status`:查询固定端口、容器状态、健康检查和访问 URL,判定标准见 `docs/reference/deployment.md`。 - `bun scripts/cli.ts server logs`:分页返回文件日志与 Docker 日志尾部,日志规则见 `docs/reference/observability.md`。 -- `bun scripts/cli.ts server rebuild `:以 build-first、Compose lock、no-deps force-recreate 和 post-up validation 的异步 job 重建主 server Compose 内单个服务;Code Queue 部署在 D601,规则见 `docs/reference/deployment.md`。 +- `bun scripts/cli.ts server rebuild `:以 build-first、Compose lock、no-deps force-recreate 和 post-up validation 的异步 job 重建主 server Compose 内单个服务;Code Queue 执行面部署在 D601,规则见 `docs/reference/deployment.md`。 - `bun scripts/cli.ts provider attach [--master-server URL] [--up] [--force]`:在新增计算节点上生成两项配置的 provider-gateway 挂载包;默认只需要主 server URL(默认 `http://74.48.78.17/`)和唯一 Provider ID,生成的 Compose 固定 Docker socket、`pid: "host"`、`restart: always`、只读 `/workspace`、SSH 维护私钥挂载和 loopback egress proxy 端口,规则见 `docs/reference/provider-gateway.md`。 - `bun scripts/cli.ts ssh [ssh-like args...]`:通过 provider-gateway 的 Host SSH / WSL SSH 维护桥打开近似原生 ssh 的交互会话或远端命令,并在远端 PATH 注入 `apply_patch`、`glob` 与 `skill-discover`;`apply-patch`、`py`、`skills`、结构化 `find`、`glob` 和 `argv` 子命令用于避免远端补丁、Python stdin、skill 发现与常用只读命令的嵌套转义问题,使用规则见 `docs/reference/cli.md` 和 `docs/reference/provider-gateway.md`。 -- `bun scripts/cli.ts microservice list/status/health/proxy`:管理和验证挂载在主 server、计算节点 Docker 或 k3s 控制面上的用户服务,`proxy` 支持受控 JSON body,OA Event Flow/Todo Note/Baidu Netdisk on main-server、k3s Control/Code Queue/MDTODO/Decision Center/FindJob/Pipeline/MET Nonlinear on D601 的规则见 `docs/reference/microservices.md`。 +- `bun scripts/cli.ts microservice list/status/health/proxy`:管理和验证挂载在主 server、计算节点 Docker 或 k3s 控制面上的用户服务,`proxy` 支持受控 JSON body,OA Event Flow/Todo Note/Baidu Netdisk/Code Queue Manager on main-server、k3s Control/Code Queue 执行面/MDTODO/Decision Center/FindJob/Pipeline/MET Nonlinear on D601 的规则见 `docs/reference/microservices.md`。 - `bun scripts/cli.ts decision upload/list/show/health`:通过 backend-core 用户服务代理上传会议记录/决议 Markdown、列出记录和查看详情;Decision Center 运行在 D601 k3s,规则见 `docs/reference/microservices.md`。 - `bun scripts/cli.ts deploy check/plan/apply [--file deploy.json] [--service ]`:按根目录 `deploy.json` 的服务 repo 和 commit 期望状态校验或更新用户服务,目标侧自行 fetch、构建、部署和 live commit 验证;规则见 `docs/reference/deploy.md`。 - `bun scripts/cli.ts codex deploy `:Code Queue 兼容部署入口,会生成临时 desired manifest 并调用 `deploy apply --service code-queue` 的同一条 target-side build 与 live commit 验证路径;规则见 `docs/reference/codex-deploy.md`。 -- `bun scripts/cli.ts codex submit [prompt] [--prompt-file path|--prompt-stdin] [--queue ]`:通过 backend-core 私有代理提交 Code Queue 任务,`--dry-run` 可只检查请求体不入队,规则见 `docs/reference/cli.md`。 +- `bun scripts/cli.ts codex submit [prompt] [--prompt-file path|--prompt-stdin] [--queue ]`:通过 backend-core 私有代理提交 Code Queue 任务;控制面默认走主 server `code-queue-mgr` 写入 PostgreSQL,`--dry-run` 可只检查请求体不入队,规则见 `docs/reference/cli.md`。 - `bun scripts/cli.ts codex task `:按 Code Queue 任务 ID 查询初始 prompt、最后 assistant message、工具调用摘要、attempt/judge/error 和耗时,便于新任务引用历史 session。 - `bun scripts/cli.ts codex judge --attempt [--dry-run]`:按指定 task/attempt 用与队列 worker 相同的上下文构建和 MiniMax judge 调用路径单步复现完成判定;`--dry-run` 只输出 prompt/payload 诊断。 - `bun scripts/cli.ts codex interrupt|cancel `:通过 Code Queue 私有代理中断运行任务或取消 queued/retry_wait 任务,规则见 `docs/reference/cli.md`。 @@ -48,12 +48,12 @@ UniDesk 是一个以主 server 为统一入口的分布式工作平台;本文 ## Runtime - `bun`:TypeScript 运行时固定使用 Bun,组件入口和 CLI 都直接运行 `.ts` 文件,约束见 `docs/reference/config.md`。 -- `docker-compose.yml`:主 server 统一编排 core、frontend、database、本机 provider gateway、Todo Note 后端、Baidu Netdisk 后端和 OA Event Flow 后端;Code Queue、MDTODO 和 Decision Center 由 D601 k3s/k8s 控制面代管,并经 `k3sctl-adapter` 的 Kubernetes API service proxy 单一路径接入,服务拓扑见 `docs/reference/deployment.md`。 +- `docker-compose.yml`:主 server 统一编排 core、frontend、database、本机 provider gateway、Todo Note 后端、Baidu Netdisk 后端、OA Event Flow 后端和轻量 Code Queue Manager 控制面;Code Queue 执行面、MDTODO 和 Decision Center 由 D601 k3s/k8s 控制面代管,并经 `k3sctl-adapter` 的 Kubernetes API service proxy 单一路径接入,服务拓扑见 `docs/reference/deployment.md`。 - `src/components/frontend`:前端源码固定使用 TypeScript + React,`app.tsx` 只做 shell/router,左侧主模块与顶部子标签统一编译为模块前缀路由:`/ops//`、`/nodes//`、`/tasks//`、`/config//`,只有用户服务使用 `/app//` 深链接,运行总览包含通用性能面板,资源监控含曲线和进程资源排序表,Todo Note、FindJob、Pipeline、MET Nonlinear、Baidu Netdisk、Code Queue、MDTODO、Decision Center、OA Event Flow、k3s Control 等业务页必须拆到独立 TSX 模块,界面规则见 `docs/reference/frontend.md`。 - `backend-core / frontend performance`:backend-core 暴露 `/api/performance`,frontend 暴露同源 `/api/frontend-performance` 并在 `/ops/performance/` 汇总组件请求、失败请求、内部操作和慢操作,规则见 `docs/reference/observability.md`。backend-core 源码已拆分为 15 个模块,结构见 `docs/reference/repo-tree.md`。 - `Unified OA event flow`:`oa-event-flow` 是独立主 server 用户服务,提供事件表、按 tag 订阅和 Trace/STEP 统计中心,Code Queue 与 Pipeline 都必须接入统一事件流;共享契约见 `docs/reference/oa-event-flow.md`,Pipeline 专有控制流规则见 `docs/reference/pipeline-oa-event-flow.md`。 - `src/components/provider-gateway`:当前主 server `74.48.78.17` 也作为 provider gateway 接入 UniDesk,外部节点通过 `ws://74.48.78.17:18082/ws/provider` 接入,必须以 `restart: always` 部署 always-enabled 远程升级、sleep-and-validate 回滚保护和 Host SSH / WSL SSH 透传并完成自测,部署与 Playwright 公网前端验证方法见 `docs/reference/provider-gateway.md`。 -- `microservices`:用户服务配置命名仍保留 `microservices`;用户服务指挂载在 UniDesk 核心服务上的用户业务能力,支持 `unidesk-direct` 与 `k3sctl-managed` 两种部署模式;k3s 代管必须使用标准 k3s/k8s 对象和 Kubernetes API service proxy,禁止业务容器直连、NodePort 和隐藏 fallback;缺少这些服务时核心仍可运行。主 server 本地开发边界固定为只开发 UniDesk frontend;非 UniDesk 核心业务后端、Dockerfile、GPU/训练调试必须在目标计算节点通过 SSH 透传或 k3s 控制面完成,Todo Note 这类明确写入主 server 的例外需单独登记,规则见 `docs/reference/microservices.md`。 +- `microservices`:用户服务配置命名仍保留 `microservices`;用户服务指挂载在 UniDesk 核心服务上的用户业务能力,支持 `unidesk-direct`、`internal-sidecar` 与 `k3sctl-managed` 部署模式;`code-queue-mgr` 是主 server 内部 sidecar 控制面,D601 Code Queue 是执行面;k3s 代管必须使用标准 k3s/k8s 对象和 Kubernetes API service proxy,禁止业务容器直连、NodePort 和隐藏 fallback;缺少这些服务时核心仍可运行。主 server 本地开发边界固定为只开发 UniDesk frontend 和已登记的内部 sidecar 控制面;非 UniDesk 核心业务后端、Dockerfile、GPU/训练调试必须在目标计算节点通过 SSH 透传或 k3s 控制面完成,Todo Note 这类明确写入主 server 的例外需单独登记,规则见 `docs/reference/microservices.md`。 - `docs/reference/e2e.md`:交付前必须执行的自测门禁、Playwright 登录、资源监控进程排序、JSON 展示断言和数据库命名卷持久化要求。 ## Architecture Docs diff --git a/TEST.md b/TEST.md index 56cb37cb..d2d12e0b 100644 --- a/TEST.md +++ b/TEST.md @@ -103,7 +103,7 @@ ## T23 D601 Code Queue User Service -阅读 `AGENTS.md`(本项目 `AGENTS.md` 同时承担 `SKILL.md` 对 `scripts/cli.ts` 的解释职责),然后用 cli 手动测试以下内容:运行 `bun scripts/cli.ts microservice list`,确认 `code-queue` 显示为 `providerId=D601`、`public=false`、`frontendOnly=true`、仓库 URL `https://github.com/pikasTech/unidesk`、k3s/k8s `k3s://unidesk/code-queue:4222` 逻辑服务映射、`deployment.mode=k3sctl-managed`、`runtime.orchestrator=k3sctl` 且无业务直连容器摘要;使用 `bun scripts/cli.ts codex deploy <已push的commitId>` 重建/启动 D601 Code Queue,确认命令立即返回异步 job id,`bun scripts/cli.ts job status --tail-bytes 30000` 能看到 fetch/export、rsync、Docker build、native k3s provider egress proxy、有效 `rancher/mirrored-pause:3.6` sandbox 镜像导入、k3s image import、kubectl apply、部署 commit 戳记、rollout、legacy direct cleanup 和 health commit 验证进度,并确认 job 最终校验真实 Code Queue `/health` 返回的 `deploy.commit` 精确匹配本次 remote commit,不能由旧服务或旧 Pod 充数;同时确认主 server 根目录 `docker-compose.yml` 中不再存在 `code-queue` service,并通过 `bun scripts/cli.ts ssh D601 argv bash -lc 'systemctl is-active k3s && KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get nodes -o wide && sudo ctr --address /run/k3s/containerd/containerd.sock -n k8s.io images ls | grep -F docker.io/rancher/mirrored-pause:3.6 && ! docker ps --format "{{.Names}} {{.Image}}" | grep -E "[[:space:]]rancher/k3s:" && ! docker ps --format "{{.Names}}" | grep -Fx code-queue-backend'` 或等价检查证明 D601 k3s 是 WSL 原生 systemd 服务、native containerd 已有正确 pause sandbox 镜像、没有 active `rancher/k3s` 控制面容器且旧 direct Docker `code-queue-backend` 没有并行运行。运行 `bun scripts/cli.ts microservice health code-queue`、`bun scripts/cli.ts microservice proxy code-queue /api/dev-ready --raw`、`bun scripts/cli.ts microservice proxy code-queue '/api/tasks/overview?limit=5&transcriptLimit=1&compact=1&afterSeq=0&preferId='` 和 `bun scripts/cli.ts codex task <已有taskId>`,确认链路通过 backend-core、k3sctl-adapter、Kubernetes API service proxy 和 D601 active Code Queue Service,且 task id 查询返回初始 prompt、最后 assistant message、工具调用摘要、attempt/judge/error 和耗时,`queue.storage.primary=postgres`、`queue.storage.postgresReady=true`、`queue.devReady.missingTools=[]`、`queue.devReady.docker.versionOk=true`、`queue.devReady.docker.composeOk=true`;`queue.devReady.ssh.ready` 只在需要跨 Provider SSH/Windows-native 任务时作为强制项。在 D601 active Code Queue Pod 内验证主 PostgreSQL 端口映射可执行 `select 1`,主 OA Event Flow 端口映射 `/health` 可访问,集群内 ClaudeQQ Service `http://claudeqq.unidesk.svc.cluster.local:3290/health` 可访问;这些映射不得成为任意公网入口。 +阅读 `AGENTS.md`(本项目 `AGENTS.md` 同时承担 `SKILL.md` 对 `scripts/cli.ts` 的解释职责),然后用 cli 手动测试以下内容:运行 `bun scripts/cli.ts microservice list`,确认 `code-queue-mgr` 显示为 `providerId=main-server`、`deployment.mode=internal-sidecar`、Compose 后端 `http://code-queue-mgr:4278`、`frontend.integrated=false`,并确认稳定 `code-queue` 条目说明队列管理/提交/历史/轻量 Trace 默认由主 server `code-queue-mgr` 负责,D601 k3s Code Queue 只负责 scheduler/runner/active run control 和执行态写回;使用 `bun scripts/cli.ts server rebuild code-queue-mgr` 重建主 server 控制面,再运行 `bun scripts/cli.ts microservice health code-queue-mgr`、`bun scripts/cli.ts microservice health code-queue`、`bun scripts/cli.ts microservice proxy code-queue '/api/tasks/overview?limit=5&transcriptLimit=1&compact=1&afterSeq=0&preferId='`、`bun scripts/cli.ts codex submit --dry-run --queue ` 和 `bun scripts/cli.ts codex task <已有taskId>`,确认普通控制/读取路径经 backend-core 分流到 master `code-queue-mgr`,返回 `role=master-control-plane`、`schemaReady=true`、PostgreSQL pool 上限、`noRunnerDependencies=true`、任务初始 prompt、最后 assistant message、工具调用摘要、attempt/judge/error 和耗时,不依赖 D601 `code-queue-write` ready endpoint。随后使用 `bun scripts/cli.ts codex deploy <已push的commitId>` 重建/启动 D601 Code Queue 执行面,确认命令立即返回异步 job id,`bun scripts/cli.ts job status --tail-bytes 30000` 能看到 fetch/export、rsync、Docker build、native k3s provider egress proxy、有效 `rancher/mirrored-pause:3.6` sandbox 镜像导入、k3s image import、kubectl apply、部署 commit 戳记、rollout、legacy direct cleanup 和 health commit 验证进度,并确认 job 最终校验真实 D601 scheduler `/health` 返回的 `deploy.commit` 精确匹配本次 remote commit,不能由旧服务或旧 Pod 充数;同时确认主 server 根目录 `docker-compose.yml` 中只存在 `code-queue-mgr` 而不存在执行面 `code-queue` service,并通过 `bun scripts/cli.ts ssh D601 argv bash -lc 'systemctl is-active k3s && KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl get nodes -o wide && sudo ctr --address /run/k3s/containerd/containerd.sock -n k8s.io images ls | grep -F docker.io/rancher/mirrored-pause:3.6 && ! docker ps --format "{{.Names}} {{.Image}}" | grep -E "[[:space:]]rancher/k3s:" && ! docker ps --format "{{.Names}}" | grep -Fx code-queue-backend'` 或等价检查证明 D601 k3s 是 WSL 原生 systemd 服务、native containerd 已有正确 pause sandbox 镜像、没有 active `rancher/k3s` 控制面容器且旧 direct Docker `code-queue-backend` 没有并行运行。运行 `bun scripts/cli.ts microservice proxy k3sctl-adapter /api/control-plane --raw` 和执行面专属 `bun scripts/cli.ts microservice proxy code-queue /api/dev-ready --raw`,确认 D601 scheduler/read/write ready endpoint、`queue.storage.primary=postgres`、`queue.storage.postgresReady=true`、`queue.devReady.missingTools=[]`、`queue.devReady.docker.versionOk=true`、`queue.devReady.docker.composeOk=true`;`queue.devReady.ssh.ready` 只在需要跨 Provider SSH/Windows-native 任务时作为强制项。在 D601 active Code Queue Pod 内验证主 PostgreSQL 端口映射可执行 `select 1`,主 OA Event Flow 端口映射 `/health` 可访问,集群内 ClaudeQQ Service `http://claudeqq.unidesk.svc.cluster.local:3290/health` 可访问;这些映射不得成为任意公网入口。 随后登录公网 frontend `http://74.48.78.17:18081/`,进入 `用户服务 / Code Queue`,确认页面显示默认模型 `gpt-5.5`、默认执行 Provider `D601`、默认工作目录 `/workspace`、模型下拉菜单包含 `gpt-5.4-mini`/`gpt-5.4`/`gpt-5.5`、入队份数、队列指标、任务 ID、复制任务 ID、引用按钮、任务耗时、引用任务 ID、清空输入、创建成功提示、任务提交表单、Trace 输出、attempt 表、MiniMax/fallback judge 状态、追加 prompt、打断和重试控件;通过页面提交一个小任务,确认任务进入 queued/running/succeeded 或可解释的 failed 状态,并且输出区能看到运行中的 Codex 消息。批量验收时设置 `入队份数=5` 或用 `---` 分隔 5 段 prompt,一次性入队 5 条任务,确认 5 条任务按顺序运行并全部进入 succeeded 或可解释的非成功终态,不能只运行第一条后停止;其中任一任务被 judge 判定 `fail` 时只能把当前任务标为 failed,后续 queued 任务仍必须继续推进。测试异常中断时可以提交长任务后点击 `打断`,确认任务变为 canceled 或被 judge 标记为非成功终态;自动重试只应在服务端/传输异常、任务正常结束但 execution record 显示未完成、或 judge 判定 retry 时发生;retry 必须复用已有 Codex thread 并 append 继续执行 prompt,只有当前任务 complete 后才推进队列中的下一个任务。MiniMax judge 必须能处理 Markdown fence/夹杂文本等 JSON 去噪;若去噪后仍失败,必须把解析错误和上一轮去噪前原始回答反馈给 MiniMax 修复后重试,日志中应出现 `judge_json_parse_retry`,且 repair 成功时仍以 `source=minimax` 返回。Codex provider key 只能通过 `OPENAI_API_KEY`、`CRS_OAI_KEY` 这类运行时环境透传,MiniMax API key 只能通过 D601 env-file 运行时环境传入,禁止写入 `config.json`、Dockerfile、源码或测试文档。 diff --git a/config.json b/config.json index 2b9a228e..756f5587 100644 --- a/config.json +++ b/config.json @@ -592,7 +592,7 @@ "id": "code-queue", "name": "Code Queue", "providerId": "D601", - "description": "Code Queue 是由 D601 k3s 控制平面代管的代码代理队列用户服务,UniDesk 只通过 k3sctl-adapter 访问其标准服务路由;当前运行拓扑固定为 D601 原生 k3s 内的 read/write/scheduler 多服务,provider-gateway 只保留维护用途。", + "description": "Code Queue 的用户服务 ID 保持稳定;队列管理、提交、历史和轻量 Trace 读取默认由主 server code-queue-mgr 直管 PostgreSQL,D601 k3s Code Queue 只负责 scheduler/runner、active run steer/interrupt 和执行态写回。", "repository": { "url": "https://github.com/pikasTech/unidesk", "commitId": "2a9f60d57401bf9d6165e44af30c2f21ada79320", @@ -643,6 +643,56 @@ "activeNodeId": "D601" } }, + { + "id": "code-queue-mgr", + "name": "Code Queue Manager", + "providerId": "main-server", + "description": "code-queue-mgr 是主 server 直管的轻量 Code Queue 控制面,只连接主 PostgreSQL,负责队列 CRUD、任务提交、历史摘要和轻量 Trace 读取;不包含 Codex/OpenCode/Playwright/Chromium/runner 依赖。", + "repository": { + "url": "https://github.com/pikasTech/unidesk", + "commitId": "local", + "dockerfile": "src/components/microservices/code-queue-mgr/Dockerfile", + "composeFile": "docker-compose.yml", + "composeService": "code-queue-mgr", + "containerName": "code-queue-mgr-backend" + }, + "backend": { + "nodeBaseUrl": "http://code-queue-mgr:4278", + "nodeBindHost": "code-queue-mgr", + "nodePort": 4278, + "proxyMode": "unidesk-direct", + "frontendOnly": true, + "public": false, + "allowedMethods": [ + "GET", + "HEAD", + "POST", + "PUT", + "PATCH", + "DELETE" + ], + "allowedPathPrefixes": [ + "/health", + "/live", + "/logs", + "/api/" + ], + "healthPath": "/health", + "timeoutMs": 10000 + }, + "deployment": { + "mode": "internal-sidecar" + }, + "development": { + "providerId": "main-server", + "sshPassthrough": false, + "worktreePath": "/root/unidesk/src/components/microservices/code-queue-mgr" + }, + "frontend": { + "route": "/apps/code-queue-mgr", + "integrated": false + } + }, { "id": "mdtodo", "name": "MDTODO", diff --git a/deploy.json b/deploy.json index 399f35d7..2fd28d08 100644 --- a/deploy.json +++ b/deploy.json @@ -51,6 +51,11 @@ "repo": "https://github.com/pikasTech/unidesk", "commitId": "2a9f60d57401bf9d6165e44af30c2f21ada79320" }, + { + "id": "code-queue-mgr", + "repo": "https://github.com/pikasTech/unidesk", + "commitId": "local" + }, { "id": "mdtodo", "repo": "https://github.com/pikasTech/unidesk", diff --git a/docker-compose.yml b/docker-compose.yml index 58abc13b..18733940 100644 --- a/docker-compose.yml +++ b/docker-compose.yml @@ -201,6 +201,36 @@ services: timeout: 3s retries: 20 + code-queue-mgr: + image: code-queue-mgr + build: + context: . + dockerfile: src/components/microservices/code-queue-mgr/Dockerfile + container_name: code-queue-mgr-backend + restart: unless-stopped + depends_on: + - database + expose: + - "4278" + environment: + HOST: "0.0.0.0" + PORT: "4278" + DATABASE_URL: "postgres://${UNIDESK_DATABASE_USER}:${UNIDESK_DATABASE_PASSWORD}@database:5432/${UNIDESK_DATABASE_NAME}" + CODE_QUEUE_MGR_DATABASE_POOL_MAX: "${UNIDESK_CODE_QUEUE_MGR_DATABASE_POOL_MAX:-2}" + CODE_QUEUE_TRACE_DATABASE_POOL_MAX: "${UNIDESK_CODE_QUEUE_TRACE_DATABASE_POOL_MAX:-1}" + CODE_QUEUE_MAIN_PROVIDER_ID: "${UNIDESK_CODE_QUEUE_DEV_CONTAINER_DEFAULT_PROVIDER_ID:-D601}" + CODE_QUEUE_REMOTE_WORKDIR: "${UNIDESK_CODE_QUEUE_REMOTE_WORKDIR:-/home/ubuntu}" + CODE_QUEUE_WORKDIR: "/workspace" + LOG_FILE: "/var/log/unidesk/${UNIDESK_LOG_DAY}/${UNIDESK_LOG_PREFIX}_code-queue-mgr.jsonl" + UNIDESK_LOG_RETENTION_BYTES: "${UNIDESK_LOG_RETENTION_BYTES:-1GiB}" + volumes: + - ${UNIDESK_LOG_DIR}:/var/log/unidesk + healthcheck: + test: ["CMD", "bun", "-e", "fetch('http://127.0.0.1:4278/health').then(r=>process.exit(r.ok?0:1)).catch(()=>process.exit(1))"] + interval: 5s + timeout: 3s + retries: 20 + frontend: build: context: . diff --git a/docs/reference/cli.md b/docs/reference/cli.md index 50fe247b..8937c3b9 100644 --- a/docs/reference/cli.md +++ b/docs/reference/cli.md @@ -12,7 +12,7 @@ UniDesk 的统一 CLI 入口是根目录 `scripts/cli.ts`,运行方式固定 - `server stop` 创建异步 job,在后台停止固定 Compose project 中的全部 UniDesk 服务。 - `server status` 查询公开端口、受限宿主端口、内部端口、Compose 容器、core/frontend/provider/database 健康检查和访问 URL;D601 Code Queue 使用的 PostgreSQL/OA Event Flow host mapping 必须出现在受限宿主端口而不是无条件公开入口中。 - `server logs` 返回 `logs/` 文件日志和 Docker 容器日志的尾部,默认限制输出大小,避免日志爆炸。 -- `server rebuild ` 创建异步 job,先构建目标服务镜像,随后在 `.state/locks/server-compose.lock` 串行保护下用 `--no-deps --force-recreate` 替换目标 service 并等待容器 `healthy/running`;该命令用于替代手工删除容器的兜底流程,其中 `todo-note`、`project-manager`、`baidu-netdisk` 和 `oa-event-flow` 只重建主 server 承载的对应后端,不会重建或删除 database 命名卷。Code Queue 部署在 D601,不再由 `server rebuild` 管理。 +- `server rebuild ` 创建异步 job,先构建目标服务镜像,随后在 `.state/locks/server-compose.lock` 串行保护下用 `--no-deps --force-recreate` 替换目标 service 并等待容器 `healthy/running`;该命令用于替代手工删除容器的兜底流程,其中 `todo-note`、`code-queue-mgr`、`project-manager`、`baidu-netdisk` 和 `oa-event-flow` 只重建主 server 承载的对应后端,不会重建或删除 database 命名卷。D601 Code Queue 执行面不由 `server rebuild` 管理。 - `provider attach [--master-server URL] [--up] [--force]` 在新计算节点生成两项配置的 provider-gateway 挂载包:`.state/provider-.env` 默认只包含 `UNIDESK_MASTER_SERVER` 与 `PROVIDER_ID`,`provider-.yml` 固定 Docker socket、`pid: "host"`、`restart: always`、只读 `/workspace` 和 SSH 维护私钥挂载;`--up` 会立即执行生成的 `docker compose up -d --build`。 - `ssh [ssh-like args...]` 通过 backend-core 内网 WebSocket broker 和 provider-gateway 的 Host SSH / WSL SSH 维护桥连接目标节点;无后续参数时进入远端登录 shell,有后续参数时按 ssh 远端命令体验执行并返回远端 exit code。 - `ssh apply-patch [tool args...] < patch.diff` 直接调用远端注入的 `apply_patch` 工具,并把本地 stdin 中的标准 `*** Begin Patch` / `*** End Patch` patch 流透传给目标节点。 @@ -22,13 +22,13 @@ UniDesk 的统一 CLI 入口是根目录 `scripts/cli.ts`,运行方式固定 - `decision upload/list/show/health` 通过 backend-core 用户服务代理访问 D601 k3s Decision Center,用于上传会议记录/决议 Markdown、列出权威记录、查看详情和健康检查;它不得直连 D601 Service、NodePort 或 provider-gateway 业务 HTTP。 - `deploy check/plan/apply` 从根目录 `deploy.json` 读取服务 repo 与 commit 期望状态,join `config.json` 和现有 manifest 后使用 target-side build 单一路径校验或更新直管服务与 k3s 代管服务;规则见 `docs/reference/deploy.md`。 - `codex deploy ` 是 Code Queue 兼容部署入口,会生成临时 desired manifest 并调用 `deploy apply --service code-queue` 的同一条 target-side build、k3s import、rollout 和 live commit 验证路径;详细规则见 `docs/reference/codex-deploy.md`。 -- `codex submit [prompt] [--prompt-file path|--prompt-stdin] [--queue queueId] [--provider-id id] [--cwd path] [--model model] [--reasoning-effort effort] [--execution-mode mode] [--max-attempts N] [--reference-task-id id] [--dry-run]` 通过 backend-core 私有代理向 Code Queue 提交任务;prompt 必须且只能来自位置参数、文件或 stdin 之一,`--dry-run` 只返回结构化请求与 prompt 预览,不实际入队。 -- `codex task ` 通过 Code Queue 私有代理按任务 ID 查询结构化执行摘要;默认只返回有界 prompt/response 预览、执行 Provider、工作目录、最后 assistant message、最近工具调用摘要、attempt、judge、错误、耗时和 trace 翻页提示,适合在新队列任务中引用历史 session 且避免噪声爆炸。 +- `codex submit [prompt] [--prompt-file path|--prompt-stdin] [--queue queueId] [--provider-id id] [--cwd path] [--model model] [--reasoning-effort effort] [--execution-mode mode] [--max-attempts N] [--reference-task-id id] [--dry-run]` 通过 backend-core 私有代理向稳定 `code-queue` 用户服务路径提交任务;prompt 必须且只能来自位置参数、文件或 stdin 之一,`--dry-run` 只返回结构化请求与 prompt 预览,不实际入队。backend-core 默认把提交、队列 CRUD、已读状态、历史摘要和轻量 Trace 读取分流到主 server `code-queue-mgr`,由它写入主 PostgreSQL;D601 scheduler 只轮询并执行已入库任务。 +- `codex task ` 通过 Code Queue 私有代理按任务 ID 查询结构化执行摘要;默认只返回有界 prompt/response 预览、执行 Provider、工作目录、最后 assistant message、最近工具调用摘要、attempt、judge、错误、耗时和 trace 翻页提示,适合在新队列任务中引用历史 session 且避免噪声爆炸。该摘要读取默认由主 server `code-queue-mgr` 从 PostgreSQL 返回,不依赖 D601 `code-queue-read` Service 可用。 - `codex task --trace --tail|--from-start|--after-seq N|--before-seq N --limit N` 按页拉取 Code Queue 的逻辑 trace;响应会返回 `nextAfterSeq`、`previousBeforeSeq`、`hasMore`、`hasBefore` 和下一页/上一页命令,默认 `--trace` 取最新一页,需要完整 prompt/最后 response 时加 `--full`。 - `codex output --tail|--from-start|--after-seq N|--before-seq N --limit N [--full-text]` 按原始 output seq 分页读取底层记录;当 trace 行提示 `commandOmittedLines`、`bodyOmittedLines` 或 `rawSeqs` 时,用该命令按 seq 补取完整信息,默认仍有单条文本预览上限,显式 `--full-text` 才返回该页全文。 -- `codex judge --attempt N [--dry-run] [--include-prompt]` 通过 Code Queue 私有代理按指定 attempt 单步复现 judge;后端会从 PostgreSQL task JSON 与 output 归档重建该 attempt 在真实队列 worker 中的 `QueueTask`/`CodexRunResult`,再调用同一套 judge prompt builder 和 MiniMax 请求路径。默认会真实调用 MiniMax,`--dry-run` 只返回 prompt/payload 大小、attempt 窗口和重建来源诊断,`--include-prompt` 仅用于本地深度排查。 -- `codex interrupt|cancel ` 通过 Code Queue 私有代理请求中断;running/judging 任务会请求当前 agent run 停止,queued/retry_wait 任务会直接转为 canceled,返回有界 task 摘要和后续查询命令。 -- Code Queue 多队列 lane 由 `codex` 命令命名空间管理:`queues` 列表、`queue create ` 创建、`queue merge --into ` 合并、`move --queue ` 迁移;同一个 queue 内部串行执行,不同 queue 之间并行执行。合并会移动任务归属并自动删除源 queue 记录,只保留合并后的目标 queue;合并后的目标 queue 按任务原 `queueEnteredAt`/`createdAt` 时间顺序串行。迁移 queued/retry_wait 任务后会立即调度目标 queue。 +- `codex judge --attempt N [--dry-run] [--include-prompt]` 通过 Code Queue 私有代理按指定 attempt 单步复现 judge;这是执行面诊断入口,仍依赖 D601 scheduler/runner 侧的真实 judge builder、MiniMax 调用路径和执行环境。默认会真实调用 MiniMax,`--dry-run` 只返回 prompt/payload 大小、attempt 窗口和重建来源诊断,`--include-prompt` 仅用于本地深度排查。 +- `codex interrupt|cancel ` 通过 Code Queue 私有代理请求中断;running/judging 任务会请求 D601 当前 agent run 停止,queued/retry_wait 任务的取消也必须保持与 WebUI 相同代理路径,返回有界 task 摘要和后续查询命令。任何需要接触 active run 的动作仍属于 D601 执行面。 +- Code Queue 多队列 lane 由 `codex` 命令命名空间管理:`queues` 列表、`queue create ` 创建、`queue merge --into ` 合并、`move --queue ` 迁移;这些队列管理入口默认由主 server `code-queue-mgr` 直管 PostgreSQL,仍通过稳定 `code-queue` 用户服务代理路径访问。同一个 queue 内部串行执行,不同 queue 之间并行执行。合并会移动任务归属并自动删除源 queue 记录,只保留合并后的目标 queue;合并后的目标 queue 按任务原 `queueEnteredAt`/`createdAt` 时间顺序串行。迁移 queued/retry_wait 任务后由 D601 scheduler 轮询推进。 - 所有 `codex` 查询和管理命令必须走与 WebUI 相同的 backend-core 私有代理路径 `/api/microservices/code-queue/proxy/...`;CLI 不得为了提交、移动、中断、取消或队列管理直接调用 D601 内部 Service、数据库、pod curl 或 k3sctl scheduler 子服务。若该路径失败,应先修复 CLI/backend/provider tunnel 链路,而不是绕过控制面。 - `job list` 与 `job status` 查询 `.state/jobs/` 文件系统状态,是异步命令的可观测入口。 - `debug health`、`debug dispatch` 与 `debug task` 走真实内部 core、WebSocket、数据库、provider、系统指标、Docker 状态和 Host SSH 维护桥流程,只用于开发调试,不写入 `TEST.md` 的正式验收步骤。 @@ -38,7 +38,7 @@ UniDesk 的统一 CLI 入口是根目录 `scripts/cli.ts`,运行方式固定 长时操作采用 Fire-and-Forget 模式:CLI 创建 `.state/jobs/{jobId}.json`,后台进程执行真实命令,并将 stdout、stderr 分别写入 `.state/jobs/{jobId}.stdout.log` 与 `.state/jobs/{jobId}.stderr.log`。调用者通过 `bun scripts/cli.ts job status ` 查询进度和尾部输出。 -`server rebuild` 与 `server start`、`server stop` 一样必须通过返回的 job id 确认结果;不要把连续 `server rebuild` 命令理解成“前一个重建已完成”,因为两个命令只是在快速创建异步 job。重建 frontend 的标准流程是运行 `bun scripts/cli.ts server rebuild frontend`,随后轮询 `bun scripts/cli.ts job status ` 到 `succeeded`,再用 `server status` 或 `e2e run` 验证公网 frontend;重建 Todo Note 后端使用 `bun scripts/cli.ts server rebuild todo-note`,随后用 `microservice health todo-note` 和 `microservice proxy todo-note /api/instances` 验证;重建 Project Manager 后端使用 `bun scripts/cli.ts server rebuild project-manager`,随后用 `microservice health project-manager` 和 `microservice proxy project-manager /api/projects` 验证;重建 Baidu Netdisk 后端使用 `bun scripts/cli.ts server rebuild baidu-netdisk`,随后用 `microservice health baidu-netdisk` 和 `microservice proxy baidu-netdisk /api/transfers` 验证;重建 OA Event Flow 后端使用 `bun scripts/cli.ts server rebuild oa-event-flow`,随后用 `microservice health oa-event-flow` 和 `microservice proxy oa-event-flow /api/diagnostics` 验证。Code Queue 和 Decision Center 后端由 D601 k3s/k8s 控制面代管,必须使用 `bun scripts/cli.ts deploy apply --service code-queue`、`bun scripts/cli.ts deploy apply --service decision-center` 或 Code Queue 兼容入口 `bun scripts/cli.ts codex deploy ` 部署已 push 的 remote commit;部署 job 自身必须通过真实 `/health` 和 k3s Deployment annotation 证明不是旧服务在充数,之后再用 `microservice health ` 和对应私有代理 API 做人工复核。不得把 `docker rm` 手工兜底当成正式交付步骤。 +`server rebuild` 与 `server start`、`server stop` 一样必须通过返回的 job id 确认结果;不要把连续 `server rebuild` 命令理解成“前一个重建已完成”,因为两个命令只是在快速创建异步 job。重建 frontend 的标准流程是运行 `bun scripts/cli.ts server rebuild frontend`,随后轮询 `bun scripts/cli.ts job status ` 到 `succeeded`,再用 `server status` 或 `e2e run` 验证公网 frontend;重建 Todo Note 后端使用 `bun scripts/cli.ts server rebuild todo-note`,随后用 `microservice health todo-note` 和 `microservice proxy todo-note /api/instances` 验证;重建 Code Queue Manager 使用 `bun scripts/cli.ts server rebuild code-queue-mgr`,随后用 `microservice health code-queue-mgr`、`microservice health code-queue` 和 `codex submit --dry-run` 验证主 server 控制面路径;重建 Project Manager 后端使用 `bun scripts/cli.ts server rebuild project-manager`,随后用 `microservice health project-manager` 和 `microservice proxy project-manager /api/projects` 验证;重建 Baidu Netdisk 后端使用 `bun scripts/cli.ts server rebuild baidu-netdisk`,随后用 `microservice health baidu-netdisk` 和 `microservice proxy baidu-netdisk /api/transfers` 验证;重建 OA Event Flow 后端使用 `bun scripts/cli.ts server rebuild oa-event-flow`,随后用 `microservice health oa-event-flow` 和 `microservice proxy oa-event-flow /api/diagnostics` 验证。D601 Code Queue 执行面和 Decision Center 后端由 D601 k3s/k8s 控制面代管,必须使用 `bun scripts/cli.ts deploy apply --service code-queue`、`bun scripts/cli.ts deploy apply --service decision-center` 或 Code Queue 兼容入口 `bun scripts/cli.ts codex deploy ` 部署已 push 的 remote commit;部署 job 自身必须通过真实 `/health` 和 k3s Deployment annotation 证明不是旧服务在充数,之后再用对应私有代理 API 做人工复核。不得把 `docker rm` 手工兜底当成正式交付步骤。 新部署入口优先使用 `deploy apply`。旧的 `server rebuild` 和 `codex deploy` 只保留为兼容入口,后续实现应收敛到同一个 reconciler:从 remote commit 导出源码,在目标节点一次性代理构建镜像,部署后用 live commit 校验证明不是旧服务。 diff --git a/docs/reference/deployment.md b/docs/reference/deployment.md index 3cfba42f..ab8a164a 100644 --- a/docs/reference/deployment.md +++ b/docs/reference/deployment.md @@ -1,6 +1,6 @@ # UniDesk Deployment Reference -主 server 使用根目录 `docker-compose.yml` 统一编排 database、backend-core、frontend、provider-gateway 以及必须留在主 server 的用户服务。当前环境本身就是主 server,因此 provider-gateway 也在同一台机器上启动,用与普通计算节点相同的 WebSocket 方式接入 core。Code Queue 不再属于主 server Compose,也不再由 backend-core 通过 provider-gateway 直连业务容器;它作为 `k3sctl-managed` 用户服务经 D601 `k3sctl-adapter` 进入 k3s 标准服务路由。 +主 server 使用根目录 `docker-compose.yml` 统一编排 database、backend-core、frontend、provider-gateway 以及必须留在主 server 的用户服务。当前环境本身就是主 server,因此 provider-gateway 也在同一台机器上启动,用与普通计算节点相同的 WebSocket 方式接入 core。Code Queue 按“master 低资源低抖动控制面、D601 高资源高抖动执行面”拆分:队列 CRUD、任务提交、历史摘要和轻量 Trace 读取由主 server Compose 中的 `code-queue-mgr` 直管 PostgreSQL;Codex/OpenCode scheduler、runner、dev-container、active run steer/interrupt 和执行态写回仍由 D601 原生 k3s/k8s Code Queue 执行面承担。 ## Services @@ -9,8 +9,9 @@ - `frontend` 是唯一公开 Web 控制台,提供登录、从 TSX 转译出的 React 应用资产和到 backend-core 的同源代理。 - `provider-gateway` 是当前主 server 的本机计算节点代理,通过 WebSocket 主动连到 provider ingress,挂载 `/var/run/docker.sock` 作为自动任务执行主路径,使用 `pid: "host"` 读取节点级进程资源,并周期性上报系统资源指标、进程占用与 Docker daemon 状态;计算节点 provider-gateway 还必须把 egress proxy 仅发布到宿主 loopback `127.0.0.1:18789` 供本节点执行环境出网,维护用 Host SSH / WSL SSH 私钥目录只读挂载到 `/run/host-ssh`,不得作为自动任务调度主路径。 - `todo-note` 是主 server 承载的 Todo Note 纯后端用户服务,容器名 `todo-note-backend`,只在 Compose 内网暴露 `4211/tcp`,使用主 PostgreSQL 存储迁移后的 Todo Note 数据。 +- `code-queue-mgr` 是主 server 承载的轻量 Code Queue 控制面用户服务,容器名 `code-queue-mgr-backend`,只在 Compose 内网暴露 `4278/tcp`,只连接主 PostgreSQL,负责 `code-queue` 稳定代理路径下的队列管理、任务提交、历史摘要、已读状态和轻量 Trace 读取;不得引入 Codex/OpenCode/Playwright/Chromium/Docker socket/runner 依赖。 - `k3sctl-adapter` 是 D601 上由 UniDesk 直管的 k3s 控制面适配微服务,容器名 `k3sctl-adapter`,只绑定 `127.0.0.1:4266`,由 UniDesk frontend/backend-core 通过用户服务代理访问并提供 `/api/control-plane` 可见性。 -- `code-queue` 是由 D601 原生 k3s/k8s 控制面代管的 Codex/OpenCode 队列用户服务,当前拓扑为 D601 内的 read/write/scheduler 多 Service,通过 `src/components/microservices/k3sctl-adapter/k3s/code-queue.k3s.json` 声明,运行对象通过 `code-queue.k8s.yaml` 创建 Kubernetes Deployment/ClusterIP Service;任务、queue、未读状态、控制状态和通知 outbox 一律写入主 PostgreSQL,不保留本地状态文件 fallback。浏览器只使用稳定的 `code-queue` 用户服务 ID,backend-core 在内部把只读请求分到 `code-queue-read`,把命令写入分到 `code-queue-write`,把执行端 `/health`、dev-container 和 running task control 分到 `code-queue-scheduler`。 +- `code-queue` 是稳定对外用户服务 ID。backend-core 对 `/api/microservices/code-queue/proxy/...` 做内部职责分流:控制/读取路径默认转到主 server `code-queue-mgr`,D601 k3s `code-queue-read`/`code-queue-write`/`code-queue-scheduler` 只承担执行面兼容、scheduler/runner、dev-container、judge 和 running task control。D601 执行面通过 `src/components/microservices/k3sctl-adapter/k3s/code-queue.k3s.json` 声明,运行对象通过 `code-queue.k8s.yaml` 创建 Kubernetes Deployment/ClusterIP Service;任务、queue、未读状态、控制状态和通知 outbox 一律写入主 PostgreSQL,不保留本地状态文件 fallback。 - `project-manager` 是主 server 承载的项目管理用户服务,容器名 `project-manager-backend`,仅在 Compose 内网暴露 `4233/tcp`,项目清单写入主 PostgreSQL,浏览器只能通过 UniDesk frontend 同源代理执行增删改查、Excel 导入和 Excel 导出。 - `baidu-netdisk` 是主 server 承载的百度网盘存储用户服务,容器名 `baidu-netdisk-backend`,仅在 Compose 内网暴露 `4244/tcp`,OAuth/token/transfer 状态写入主 PostgreSQL,浏览器只能通过 UniDesk frontend 同源代理执行设备码登录、文件浏览和 staging 传输任务控制。 @@ -34,7 +35,7 @@ Compose v2 安装后仍然必须遵守 UniDesk 的服务控制入口:全栈生 ## Single Service Rebuild -前端、backend-core、本机 provider-gateway 或主 server 承载的 Todo Note/Project Manager/Baidu Netdisk/OA Event Flow 用户服务需要重建时,统一使用 `bun scripts/cli.ts server rebuild `,其中 `` 只能是 `backend-core`、`frontend`、`provider-gateway`、`todo-note`、`project-manager`、`baidu-netdisk` 或 `oa-event-flow`。Code Queue、File Browser、FindJob、Pipeline、MET Nonlinear 和 ClaudeQQ 部署在计算节点,不属于主 server Compose 可重建服务;其中 D601 Code Queue 的正式入口是 `bun scripts/cli.ts codex deploy `。该命令先执行目标服务镜像构建,构建成功后才通过 `up -d --no-deps --force-recreate ` 替换目标容器,避免构建失败导致运行中的服务被提前停掉。 +前端、backend-core、本机 provider-gateway 或主 server 承载的 Todo Note/Code Queue Manager/Project Manager/Baidu Netdisk/OA Event Flow 用户服务需要重建时,统一使用 `bun scripts/cli.ts server rebuild `,其中 `` 只能是 `backend-core`、`frontend`、`provider-gateway`、`todo-note`、`code-queue-mgr`、`project-manager`、`baidu-netdisk` 或 `oa-event-flow`。D601 Code Queue 执行面、File Browser、FindJob、Pipeline、MET Nonlinear 和 ClaudeQQ 部署在计算节点,不属于主 server Compose 可重建服务;其中 D601 Code Queue 执行面的正式入口是 `bun scripts/cli.ts codex deploy `。该命令先执行目标服务镜像构建,构建成功后才通过 `up -d --no-deps --force-recreate ` 替换目标容器,避免构建失败导致运行中的服务被提前停掉。 frontend 改动必须明确上线到公网:修改 `src/components/frontend/src/`、`src/components/frontend/public/style.css`、frontend 使用的共享 TSX/TS 模块或 WebUI 导航后,必须在同一变更集中执行 `bun scripts/cli.ts server rebuild frontend`,并等待 job 成功。公网 WebUI 的 `/app.js` 是 `unidesk-frontend` 容器启动时从镜像内源码转译生成的运行时 bundle;只改工作区文件、只跑 `bun run check`、只跑 `Bun.build` 或只刷新浏览器都不会替换已经运行的容器。 @@ -52,15 +53,17 @@ frontend 的 Docker 上线顺序为:先运行必要的本地校验,例如 `b ## Health Criteria -服务跑通的最低标准是:backend-core 内网 `/health` 返回 ok,frontend 公网 `/health` 返回 ok,provider ingress 公网 `/health` 返回 ok,database 在容器内 `pg_isready` 可用,Todo Note 后端 `/api/health` 返回 `storage=postgres`,`k3sctl-adapter` `/api/control-plane` 可见 `unidesk-k3s` Kubernetes API service proxy 状态、D601 scheduler serving healthy、D601 read/write Service healthy、`presentNodeIds` 包含 `D601`、`missingNodeIds=[]` 和 no-fallback 路径,Code Queue `/health` 经 k3s scheduler Service 返回轻量 readiness、默认模型、`queue.storage` 和 `egressProxy.connected=true`,`/api/tasks/overview` 经 `code-queue-read` 返回 PostgreSQL 队列总览,写入类入口经 `code-queue-write` 入库后由 `code-queue-scheduler` 轮询并执行,Project Manager `/health` 返回 `storage.primary=postgres` 和项目数量,backend-core `/api/performance` 返回性能指标,`/api/nodes` 中出现 `main-server`、`D601` 和 `D518` provider 且状态为 `online`,`/api/nodes/system-status` 中出现对应 CPU/内存/硬盘采样,`/api/nodes/docker-status` 中能看到主 server、D601 与 D518 Docker 快照。D601/D518 上不得存在 active `rancher/k3s` 容器;D518 只有在原生 k3s-agent 与稳定 Kubernetes 网络完成验证后才可加入 Code Queue expected nodes。交付前还必须运行 `bun scripts/cli.ts e2e run`,并以 `docs/reference/e2e.md` 的门禁作为最终判定。 +服务跑通的最低标准是:backend-core 内网 `/health` 返回 ok,frontend 公网 `/health` 返回 ok,provider ingress 公网 `/health` 返回 ok,database 在容器内 `pg_isready` 可用,Todo Note 后端 `/api/health` 返回 `storage=postgres`,`code-queue-mgr` `/health` 返回 `role=master-control-plane`、`schemaReady=true` 和资源预算字段,`bun scripts/cli.ts microservice health code-queue` 与 `/api/tasks/overview` 默认经主 server `code-queue-mgr` 返回 PostgreSQL 队列总览,`k3sctl-adapter` `/api/control-plane` 可见 `unidesk-k3s` Kubernetes API service proxy 状态、D601 scheduler serving healthy、D601 read/write Service healthy、`presentNodeIds` 包含 `D601`、`missingNodeIds=[]` 和 no-fallback 路径,D601 Code Queue scheduler `/health` 返回轻量 readiness、默认模型、`queue.storage` 和 `egressProxy.connected=true`,提交类入口经 `code-queue-mgr` 入库后由 D601 `code-queue-scheduler` 轮询并执行,Project Manager `/health` 返回 `storage.primary=postgres` 和项目数量,backend-core `/api/performance` 返回性能指标,`/api/nodes` 中出现 `main-server`、`D601` 和 `D518` provider 且状态为 `online`,`/api/nodes/system-status` 中出现对应 CPU/内存/硬盘采样,`/api/nodes/docker-status` 中能看到主 server、D601 与 D518 Docker 快照。D601/D518 上不得存在 active `rancher/k3s` 容器;D518 只有在原生 k3s-agent 与稳定 Kubernetes 网络完成验证后才可加入 Code Queue expected nodes。交付前还必须运行 `bun scripts/cli.ts e2e run`,并以 `docs/reference/e2e.md` 的门禁作为最终判定。 -## Code Queue D601 Resource Budget +## Code Queue Control/Execution Resource Budget -Code Queue 已从主 server 迁移到 D601 k3s/k8s,但仍必须保持明确的 memory/swap 硬上限,默认 `CODE_QUEUE_MAX_ACTIVE_QUEUES=0` 以恢复 queue 间并行,仍保持 `CODE_QUEUE_IN_MEMORY_OUTPUT_RECORDS=10`、`CODE_QUEUE_IN_MEMORY_EVENT_RECORDS=10` 这类小热窗口;任务历史、队列统计和 Trace/output 读取必须优先从 PostgreSQL 直读或聚合,`/health` 只做轻量 readiness,不能为了性能便利在 Bun 进程内缓存全量历史。任何提高 Code Queue 热窗口、日志缓冲、Playwright/Codex 子进程常驻规模或容器上限的变更,或把 `CODE_QUEUE_MAX_ACTIVE_QUEUES` 显式改成正数,都必须在同一任务里说明 D601 资源预算来源,并通过 D601 `KUBECONFIG=/home/ubuntu/unidesk-code-queue-deploy/.state/k3s/kubeconfig kubectl -n unidesk get deploy,svc,pod`、`kubectl -n unidesk top pod` 或等价 Docker stats、`microservice health code-queue` 和对应 E2E 证明未重新引入内存爆炸风险。 +主 server `code-queue-mgr` 是低资源控制面,目标常驻内存不超过 100 MB,只允许 PostgreSQL 小连接池、日志和基础 CRUD/摘要逻辑;不得安装或运行 Playwright、Chromium、Codex/OpenCode、Docker socket、dev-container 或执行器。它的 `/health` 必须暴露 `resourceBudget.targetMemoryMb=100`、`noRunnerDependencies=true`、连接池上限和 `role=master-control-plane`,便于在主 server 低内存环境中识别是否越界。 + +D601 Code Queue 执行面仍必须保持明确的 memory/swap 硬上限,默认 `CODE_QUEUE_MAX_ACTIVE_QUEUES=0` 以恢复 queue 间并行,仍保持 `CODE_QUEUE_IN_MEMORY_OUTPUT_RECORDS=10`、`CODE_QUEUE_IN_MEMORY_EVENT_RECORDS=10` 这类小热窗口;任务历史、队列统计和 Trace/output 读取必须优先从 PostgreSQL 直读或聚合,执行面 `/health` 只做轻量 readiness,不能为了性能便利在 Bun 进程内缓存全量历史。任何提高 Code Queue 热窗口、日志缓冲、Playwright/Codex 子进程常驻规模或容器上限的变更,或把 `CODE_QUEUE_MAX_ACTIVE_QUEUES` 显式改成正数,都必须在同一任务里说明 D601 资源预算来源,并通过 D601 `KUBECONFIG=/home/ubuntu/unidesk-code-queue-deploy/.state/k3s/kubeconfig kubectl -n unidesk get deploy,svc,pod`、`kubectl -n unidesk top pod` 或等价 Docker stats、D601 scheduler health 和对应 E2E 证明未重新引入内存爆炸风险。 ## Database Connection Budget -主 PostgreSQL 的内存预算按“少量长驻服务连接池 + 短查询按需连接”设计,不允许每个 Bun 服务沿用默认 8 到 10 个连接。`backend-core` 默认 `DATABASE_POOL_MAX=4`,主 server 上的 `oa-event-flow`、`baidu-netdisk` 默认 `DATABASE_POOL_MAX=2`,`project-manager` 默认 `DATABASE_POOL_MAX=1`,D601 `code-queue` 默认 `CODE_QUEUE_DATABASE_POOL_MAX=2`;如需提高任一连接池上限,必须同时说明并发 SQL 需求、验证 `pg_stat_activity` 中该服务没有长期 idle 堆积,并确认 `max_connections` 仍有足够余量。PostgreSQL 基础配置固定保守值:`shared_buffers=128MB`、`work_mem=4MB`、`maintenance_work_mem=64MB`、`max_connections=50`,避免主 server 低内存环境被空闲 backend 和过大的 per-query 内存预算挤占。 +主 PostgreSQL 的内存预算按“少量长驻服务连接池 + 短查询按需连接”设计,不允许每个 Bun 服务沿用默认 8 到 10 个连接。`backend-core` 默认 `DATABASE_POOL_MAX=4`,主 server 上的 `oa-event-flow`、`baidu-netdisk` 默认 `DATABASE_POOL_MAX=2`,`project-manager` 默认 `DATABASE_POOL_MAX=1`,`code-queue-mgr` 默认 `CODE_QUEUE_MGR_DATABASE_POOL_MAX=2` 与 `CODE_QUEUE_TRACE_DATABASE_POOL_MAX=1`,D601 `code-queue` 默认 `CODE_QUEUE_DATABASE_POOL_MAX=2`;如需提高任一连接池上限,必须同时说明并发 SQL 需求、验证 `pg_stat_activity` 中该服务没有长期 idle 堆积,并确认 `max_connections` 仍有足够余量。PostgreSQL 基础配置固定保守值:`shared_buffers=128MB`、`work_mem=4MB`、`maintenance_work_mem=64MB`、`max_connections=50`,避免主 server 低内存环境被空闲 backend 和过大的 per-query 内存预算挤占。 排查 PostgreSQL 内存时以 `docker stats unidesk-database`、`pg_stat_activity` 分组和 `pg_settings` 为准;主机 `ps` 中每个 `postgres` 进程的 RSS 会重复计入共享内存,不能把所有 backend RSS 简单相加当作真实容器占用。所有 UniDesk PostgreSQL 客户端都必须设置可识别的 `application_name`,便于按服务统计连接数、状态和慢查询归属。 diff --git a/docs/reference/frontend.md b/docs/reference/frontend.md index 0fdf0a16..0306d4e9 100644 --- a/docs/reference/frontend.md +++ b/docs/reference/frontend.md @@ -96,7 +96,7 @@ frontend shell 必须把左侧主模块与顶部子标签编译为统一的 URL - `Baidu Netdisk` 子标签必须把主 server `baidu-netdisk-backend` 后端渲染为 UniDesk React 控件,包括 OAuth 设备码二维码/用户码登录、账号容量、配置工作根文件浏览(当前默认百度网盘根目录 `/`)、staging 目录上传/下载任务、上传/下载自测按钮与 MD5 结果、脱敏安全说明、日志摘要和显式原始 JSON 按钮;不得把 access token、refresh token、dlink 或 staging 文件字节流裸露到浏览器。 - `OA Event Flow` 子标签必须把主 server `oa-event-flow-backend` 后端渲染为 UniDesk React 控件,包括服务健康、事件表、tag 过滤、SSE live 状态、Trace/STEP stats 表、Code Queue/Pipeline 标签入口和显式原始 JSON 按钮;默认页面不得裸铺完整事件 JSON,事件表只展示结构化摘要,完整 envelope/payload 只能通过 `查看原始JSON` 打开。 - `k3s Control` 子标签必须把 D601 `k3sctl-adapter` 控制面渲染为 UniDesk React 控件,包括 control plane 状态、manifest 列表、D601 scheduler/read/write 实例、active instance、single-writer/no-fallback 路径、Kubernetes API service proxy 状态、kubectl/k3s snapshot 摘要和显式原始 JSON 按钮;页面只能通过 `/api/microservices/k3sctl-adapter/proxy/api/control-plane` 取数,不得直接访问 provider-gateway、NodePort、业务容器端口或裸 k3s/kubectl API。 - - `Code Queue` 子标签必须把 D601 k3s/k8s `code-queue` Service 渲染为 UniDesk React 控件,前端 API 基址只能是 `/api/microservices/code-queue/proxy`,不能继续使用旧 `/api/code-queue-direct` 别名;页面包括多 queue lane、queue 内串行、queue 间并行、queue 合并(点击“合并 queue”后必须用公共 `UniDeskDialog` 打开独立小窗口,用下拉菜单选择源 queue;不得把源 queue 选择控件塞进正常提交任务的 Queue 选择区;合并后自动删除源 queue,只保留合并后的目标 queue,目标 queue 按原 queueEnteredAt/createdAt 时间顺序串行)、任务 ID/复制任务 ID、引用按钮、任务耗时、任务提交/批量提交、引用任务 ID、创建成功提示、清空输入、模型下拉、执行 Provider 下拉、执行模式下拉(默认容器/本机或 `windows-native`)、显式入队份数、默认模型 `gpt-5.5`、MiniMax judge 状态、Codex CLI-like 输出流、attempt 终态、运行中追加 prompt、打断、手动重试和显式原始 JSON 按钮;`windows-native` 模式必须在任务 JSON、卡片和 Trace 头部显示,并要求非本机 WSL Provider 与 `/mnt/` 工作目录;Codex CLI-like 输出流必须始终保留任务的初始 `Submitted prompt` 和运行中 `Steer prompt`;整个 agent loop 消息流统一命名为专有名词 `Trace`,`Trace` 包含 assistant message、user prompt、system event 和 tool call,但非错误 system event 默认只保留在原始输出/数据库中,不在 TraceView 展示;Code Queue 与 Pipeline/OpenCode messages 必须共用 `src/components/frontend/src/trace.tsx` 的 Trace 公共组件、统一 Trace item 接口和 codex/opencode port 适配层;连续 read/edit/run 工具调用只是在 Trace 内折叠为可展开工具调用组,汇总格式至少包含 `xx read, xx edit, xx run`,并展示读取文件、编辑文件、运行命令和耗时摘要;最近 3 个工具调用保持展开,工具调用内容不得自动换行且必须在工具调用块内部横向滚动,工具调用组展开后不得再增加额外左侧缩进;message 与 prompt 必须自动换行,普通 message 不显示左侧项目符号缩进且永不折叠;Trace 首屏可以是摘要预览,但终态任务被选中后必须自动在后台加载完整 Trace,手动“加载完整 Trace”也必须从 Code Queue output archive 分页补齐早期 trace,不得把 preview 的 `hasMore=false` 当成完整历史;即使热状态为控制体积裁剪了早期 raw output,也要从结构化 `basePrompt/displayPrompt/promptHistory` 和 archive 合成完整用户输入与 agent trace,并且初始 prompt 默认显示注入前 prompt 而不是引用注入全文;当初始 prompt 含引用注入时,引用内容必须默认折叠,并只在 Trace 的初始消息中提供可展开的“最终传入 Codex 的真实完整 prompt”,不得再渲染独立 Prompt 全量卡片;多轮引用注入必须按上游/最早上下文在前、直接引用在后的顺序排列,每一轮必须有明确 `Reference Round N/M` 分割线和时间范围,不能用固定 6 轮截断引用链;点击队列引用按钮必须自动把该任务 ID 写入提交表单的引用输入框,引用任务 ID 创建新任务时必须自动注入 `bun scripts/cli.ts codex task ` 的提示;连续执行同一 prompt 应通过入队份数一次性生成多条任务,避免快速连点造成操作员误判。 + - `Code Queue` 子标签必须把稳定 `code-queue` 用户服务渲染为 UniDesk React 控件,前端 API 基址只能是 `/api/microservices/code-queue/proxy`,不能继续使用旧 `/api/code-queue-direct` 别名;backend-core 会把 queue CRUD、submit、history、readAt 和轻量 Trace 读取分流到主 server `code-queue-mgr`,把 active run steer/interrupt、judge、dev-container 和执行面健康分流到 D601 k3s/k8s Code Queue 执行面。页面包括多 queue lane、queue 内串行、queue 间并行、queue 合并(点击“合并 queue”后必须用公共 `UniDeskDialog` 打开独立小窗口,用下拉菜单选择源 queue;不得把源 queue 选择控件塞进正常提交任务的 Queue 选择区;合并后自动删除源 queue,只保留合并后的目标 queue,目标 queue 按原 queueEnteredAt/createdAt 时间顺序串行)、任务 ID/复制任务 ID、引用按钮、任务耗时、任务提交/批量提交、引用任务 ID、创建成功提示、清空输入、模型下拉、执行 Provider 下拉、执行模式下拉(默认容器/本机或 `windows-native`)、显式入队份数、默认模型 `gpt-5.5`、MiniMax judge 状态、Codex CLI-like 输出流、attempt 终态、运行中追加 prompt、打断、手动重试和显式原始 JSON 按钮;`windows-native` 模式必须在任务 JSON、卡片和 Trace 头部显示,并要求非本机 WSL Provider 与 `/mnt/` 工作目录;Codex CLI-like 输出流必须始终保留任务的初始 `Submitted prompt` 和运行中 `Steer prompt`;整个 agent loop 消息流统一命名为专有名词 `Trace`,`Trace` 包含 assistant message、user prompt、system event 和 tool call,但非错误 system event 默认只保留在原始输出/数据库中,不在 TraceView 展示;Code Queue 与 Pipeline/OpenCode messages 必须共用 `src/components/frontend/src/trace.tsx` 的 Trace 公共组件、统一 Trace item 接口和 codex/opencode port 适配层;连续 read/edit/run 工具调用只是在 Trace 内折叠为可展开工具调用组,汇总格式至少包含 `xx read, xx edit, xx run`,并展示读取文件、编辑文件、运行命令和耗时摘要;最近 3 个工具调用保持展开,工具调用内容不得自动换行且必须在工具调用块内部横向滚动,工具调用组展开后不得再增加额外左侧缩进;message 与 prompt 必须自动换行,普通 message 不显示左侧项目符号缩进且永不折叠;Trace 首屏可以是摘要预览,但终态任务被选中后必须自动在后台加载完整 Trace,手动“加载完整 Trace”也必须从 Code Queue output archive 分页补齐早期 trace,不得把 preview 的 `hasMore=false` 当成完整历史;即使热状态为控制体积裁剪了早期 raw output,也要从结构化 `basePrompt/displayPrompt/promptHistory` 和 archive 合成完整用户输入与 agent trace,并且初始 prompt 默认显示注入前 prompt 而不是引用注入全文;当初始 prompt 含引用注入时,引用内容必须默认折叠,并只在 Trace 的初始消息中提供可展开的“最终传入 Codex 的真实完整 prompt”,不得再渲染独立 Prompt 全量卡片;多轮引用注入必须按上游/最早上下文在前、直接引用在后的顺序排列,每一轮必须有明确 `Reference Round N/M` 分割线和时间范围,不能用固定 6 轮截断引用链;点击队列引用按钮必须自动把该任务 ID 写入提交表单的引用输入框,引用任务 ID 创建新任务时必须自动注入 `bun scripts/cli.ts codex task ` 的提示;连续执行同一 prompt 应通过入队份数一次性生成多条任务,避免快速连点造成操作员误判。 - `MDTODO` 子标签必须把 D601 k3s `mdtodo` Service 渲染为 UniDesk React 控件,前端 API 基址只能是 `/api/microservices/mdtodo/proxy`;页面包括 TODO Markdown 文件列表、任务树、状态徽标、标题与正文编辑、新增根任务/子任务、删除任务、执行命令生成、hostPath 健康摘要和显式原始 JSON 按钮,不得 iframe 原 VS Code webview、公开 VSIX 旧前端或把完整 Markdown/JSON 默认铺在页面上。 - `Code Queue` 前端改进必须在同一任务内重建并上线公网 frontend,不能只修改源码或本地 bundle;重建 frontend 是无状态 WebUI 替换,不会导致 Code Queue 长期任务失败。已结束未读任务只能在 task card 边角显示类似未读消息的 `codex-unread-badge` 圆点和“标为已读”操作,不得把整张卡片改成红色/琥珀色失败态边框、背景或胶囊标签;状态栏的“结束未读”提示也不得使用失败态红色。 - `Code Queue` 前端必须把 PostgreSQL-backed backend API 作为 task、queue、readAt/未读状态和 attempt 状态的唯一数据来源;不得用 `localStorage`、`sessionStorage` 或 IndexedDB 持久化这些业务状态,也不得在后端标记已读失败时伪造本地成功。前端允许保留 React 内存态、请求 in-flight guard 和本轮页面缓存,但刷新页面或切换设备后的状态必须完全由后端 PostgreSQL 数据恢复。 diff --git a/docs/reference/microservices.md b/docs/reference/microservices.md index 138f11fb..0e91d20c 100644 --- a/docs/reference/microservices.md +++ b/docs/reference/microservices.md @@ -7,7 +7,7 @@ UniDesk 用户服务是挂载到 UniDesk 核心服务上的、面向用户使用 ## Boundary - 用户服务后端端口默认只绑定计算节点本机地址,例如 `127.0.0.1:`,不得直接暴露公网。 -- 浏览器只访问 UniDesk frontend;frontend 通过同源 `/api/microservices/*` 代理到 backend-core。`deployment.mode=unidesk-direct` 的用户服务由 backend-core 通过目标 provider-gateway 的 `microservice.http` 能力访问计算节点本机后端;`deployment.mode=k3sctl-managed` 的用户服务只允许经 `k3sctl-adapter` 微服务进入 k3s 标准服务路由,backend-core 不得直接向业务容器所在 provider 下发 `microservice.http`。 +- 浏览器只访问 UniDesk frontend;frontend 通过同源 `/api/microservices/*` 代理到 backend-core。`deployment.mode=unidesk-direct` 的用户服务由 backend-core 通过目标 provider-gateway 的 `microservice.http` 能力访问计算节点本机后端;`deployment.mode=internal-sidecar` 的主 server 内置控制面服务由 backend-core 直接访问同一 Compose 网络内的显式服务名;`deployment.mode=k3sctl-managed` 的用户服务只允许经 `k3sctl-adapter` 微服务进入 k3s 标准服务路由,backend-core 不得直接向业务容器所在 provider 下发 `microservice.http`。 - backend-core REST API、database 和计算节点用户服务后端都不得新增公网端口;公网入口仍只有 frontend 和 provider ingress。 - `microservice.http` 只允许 provider-gateway 访问 `http://127.0.0.1`、`http://localhost`、`http://host.docker.internal` 这类节点本地地址,或明确登记为同一私有 Docker network 内的服务名;主 server 内置用户服务可使用同一 Compose 网络内的显式服务名,例如 `todo-note:4211`。k3s 代管服务不得把业务容器地址登记成 provider-gateway 直连目标,`backend.proxyMode` 必须使用 `k3sctl-adapter-http`,`backend.nodeBaseUrl` 可使用 `k3s://` 这类逻辑服务名。backend-core 还必须用 `allowedPathPrefixes` 和 `allowedMethods` 同时限制可代理路径和 HTTP 方法。 @@ -19,7 +19,7 @@ UniDesk 用户服务是挂载到 UniDesk 核心服务上的、面向用户使用 - `repository.url` 与 `repository.commitId`,用于记录业务代码的外部权威来源;UniDesk 不 vendoring 业务全量代码。 - `repository.dockerfile`、`repository.composeFile`、`repository.composeService` 和 `repository.containerName`,用于说明部署应复用业务仓库自身维护的 Dockerfile/docker-compose。 - `backend.nodeBaseUrl`、`backend.nodeBindHost`、`backend.nodePort`、`backend.proxyMode`、`backend.public=false`、`backend.frontendOnly=true`、`backend.allowedMethods`、`backend.allowedPathPrefixes` 和 `backend.healthPath`,用于定义计算节点端口到 UniDesk frontend-only 代理的映射。 -- `deployment.mode`,用于明确部署责任边界;`unidesk-direct` 表示 UniDesk 直接登记和探测目标 provider 上的容器,`k3sctl-managed` 表示 UniDesk 只登记逻辑服务并经 `deployment.adapterServiceId` 指向的 `k3sctl-adapter` 访问,代管条目还必须写明 `k3sServiceId`、`namespace`、`expectedNodeIds` 和当前 `activeNodeId`。 +- `deployment.mode`,用于明确部署责任边界;`unidesk-direct` 表示 UniDesk 直接登记和探测目标 provider 上的容器,`internal-sidecar` 表示主 server Compose 内的轻量控制面/基础设施服务,`k3sctl-managed` 表示 UniDesk 只登记逻辑服务并经 `deployment.adapterServiceId` 指向的 `k3sctl-adapter` 访问,代管条目还必须写明 `k3sServiceId`、`namespace`、`expectedNodeIds` 和当前 `activeNodeId`。 - `development.providerId`、`development.sshPassthrough=true` 和 `development.worktreePath`,用于说明开发调试入口必须在计算节点上通过 UniDesk SSH 透传完成。 - `frontend.route` 和 `frontend.integrated=true`,用于说明该业务前端已经整合到 UniDesk React 控制台,而不是继续公开业务自身前端。 @@ -78,6 +78,16 @@ Todo Note 数据迁移后必须验证:`microservice proxy todo-note /api/insta OA Event Flow 在 UniDesk 语境中按共享控制面基础设施管理:不得暴露公网端口,不得把事件或统计权威状态写入 `.state/`;Code Queue 与 Pipeline 都必须通过该服务发布事实事件、订阅 tag stream 和读取统计中心。共享事件流、统计中心和完成门禁见 `docs/reference/oa-event-flow.md`。 +### Code Queue Manager On Main Server + +`code-queue-mgr` 是主 server Compose 内的轻量 Code Queue 控制面,登记为 `deployment.mode=internal-sidecar`,Provider 为 `main-server`,后端地址为 Compose 网络内 `http://code-queue-mgr:4278`。它不直接出现在前端业务标签中,而是作为稳定 `code-queue` 用户服务代理路径的内部控制面目标。 + +- 职责:队列 CRUD、任务提交、批量提交、任务移动、queued prompt edit、已读状态、历史摘要、overview、stats、summary、prompt、output/transcript/trace 的轻量 PostgreSQL 读取。 +- 非职责:不运行 Codex/OpenCode,不包含 Playwright/Chromium,不持有 Docker socket,不创建 dev-container,不执行 judge,不管理 active run steer/interrupt,不做任务调度或 runner。 +- 资源边界:目标常驻内存不超过 100 MB,默认 PostgreSQL pool 为 `CODE_QUEUE_MGR_DATABASE_POOL_MAX=2` 与 `CODE_QUEUE_TRACE_DATABASE_POOL_MAX=1`,`/health` 必须暴露 `role=master-control-plane`、`schemaReady`、连接池上限和 `noRunnerDependencies=true`。 +- 路由:CLI/WebUI 仍只访问 `/api/microservices/code-queue/proxy/...`;backend-core 在内部把控制/读取路径转到 `code-queue-mgr`,把 active run、judge、dev-container、执行面健康和 scheduler 相关路径转到 D601 执行面。 +- 行为兼容:提交与 queued prompt edit 必须保留 Code Queue 环境提示注入、`--reference-task-id`/引用输入解析和引用任务上下文注入,避免 master 控制面路径与 D601 原写服务语义分叉。 + ### Project Manager On Main Server 当前 Project Manager 作为 `id=project-manager` 的用户服务登记在 `config.json`: @@ -143,13 +153,14 @@ Baidu Netdisk 在 UniDesk 语境中按纯后端服务管理:不得暴露百度 ### Code Queue k3s-Managed -当前 Code Queue 作为 `id=code-queue` 的 `k3sctl-managed` 用户服务登记在 `config.json`,业务实例由 D601 k3s 控制面代管,并接入统一 `oa-event-flow` 发布 Trace/STEP 事实事件与读取统计中心: +当前对外 `id=code-queue` 是稳定用户服务 ID,实际按 master 控制面与 D601 执行面拆分。队列管理、提交、历史摘要、已读状态和轻量 Trace 读取默认由主 server `code-queue-mgr` 直管 PostgreSQL;D601 k3s Code Queue 作为执行面代管,负责 scheduler/runner、dev-container、active run steer/interrupt、judge、输出/attempt/通知写回,并接入统一 `oa-event-flow` 发布 Trace/STEP 事实事件与读取统计中心: -- Orchestrator:`deployment.mode=k3sctl-managed`,`deployment.adapterServiceId=k3sctl-adapter`,`deployment.k3sServiceId=code-queue`,`backend.proxyMode=k3sctl-adapter-http`,`backend.nodeBaseUrl=k3s://code-queue`;backend-core 对 Code Queue 的正式链路只能是 `frontend -> backend-core -> k3sctl-adapter -> Kubernetes API service proxy -> Kubernetes Service ...:4222`。对外登记的 `code-queue` ID 保持稳定,但内部必须拆成 `code-queue-read`、`code-queue-write` 和 `code-queue-scheduler` 三个 Kubernetes Service,并由 backend-core 按 method/path 分流。 +- Orchestrator:稳定 `code-queue` ID 的控制/读取路径由 backend-core 分流到 `deployment.mode=internal-sidecar` 的 `code-queue-mgr`;D601 执行面仍登记为 `deployment.mode=k3sctl-managed`,`deployment.adapterServiceId=k3sctl-adapter`,`deployment.k3sServiceId=code-queue`,`backend.proxyMode=k3sctl-adapter-http`,`backend.nodeBaseUrl=k3s://code-queue`。对外登记的 `code-queue` ID 保持稳定,frontend/CLI 不需要知道内部拆分。 - Direct path ban:`code-queue` 不得再登记 `http://code-queue:4222`、`http://host.docker.internal:4222`、NodePort 或 provider-gateway `microservice.http` 作为业务代理目标;frontend 也不得使用旧 `/api/code-queue-direct` 兼容别名作为 Code Queue 页面数据源。provider-gateway 只允许用于维护 D601/D518、部署 adapter、部署 k3s/k8s 节点或诊断节点本机容器。 +- D601 Service boundary:D601 内部可以继续保留 `code-queue-read`、`code-queue-write` 和 `code-queue-scheduler` 三个 Kubernetes Service 作为执行面兼容和过渡对象,但普通提交、queue CRUD、history、readAt 和轻量 overview 不得依赖 `code-queue-write` 或 D601 egress 可用;`code-queue-write` 不 ready 时,主 server `code-queue-mgr` 仍应保证 CLI/WebUI 的提交、列表和历史读取可用。需要 active run、dev-container、judge 或执行面健康的路径才进入 D601 scheduler。 - 服务拆分语义:`code-queue-read` 只承载 GET/HEAD 查询、overview、任务详情、Trace/output/transcript、统计和只读健康,可多副本滚动更新;它必须设置 `CODE_QUEUE_SERVICE_ROLE=read` 与 `CODE_QUEUE_SCHEDULER_ENABLED=false`,且不得接受入队、queue 变更、已读、重试、移动、追加 prompt 或打断这类 mutation。`code-queue-write` 承载入队、queue 创建/合并/更新、已读、手动重试、移动等命令写入,初期保持单副本和 `CODE_QUEUE_SERVICE_ROLE=write`,只把命令和任务状态写入 PostgreSQL,不启动 agent 子进程。`code-queue-scheduler` 是唯一拥有 scheduler 和 active run 的执行服务,设置 `CODE_QUEUE_SERVICE_ROLE=scheduler` 与 `CODE_QUEUE_SCHEDULER_ENABLED=true`,负责从 PostgreSQL 热任务集轮询新写入任务、推进队列、启动 Codex/OpenCode、处理 running task 的 steer/interrupt、发送终态通知和暴露执行端 `/health`。普通 Service 负载均衡不得把 mutation 打到 read,也不得把 running task 控制打到 write。 -- 实例语义:D601 是当前唯一 active/single-writer 执行节点,`code-queue-read` 在 D601 内多副本承载只读流量,`code-queue-write` 承载写入命令,`code-queue-scheduler` 以一个 scheduler Pod 承载长生命周期 Codex/OpenCode 子进程。D518 不属于当前 Code Queue k3s 拓扑;在没有原生 k3s-agent 与稳定 Kubernetes 网络前,不得把 D518 写回 `expectedNodeIds` 或恢复 `code-queue-d518` standby。D601 scheduler 默认关闭 `CODE_QUEUE_STARTUP_OA_BACKFILL_ENABLED`;历史 OA Trace/STEP 回填必须通过显式 `/api/oa/backfill` 运维动作触发,不能在每次 Pod 重启时自动批量发布旧事件。 -- 滚动更新边界:read/write/scheduler 三服务拆分可以保证滚动更新期间 Code Queue 的只读 API 与大部分控制面入口可用,但当前 scheduler Pod 内仍直接承载正在运行的 agent 子进程,scheduler Pod 被替换时 active task 仍会进入 restart-recovery/retry 语义,不能宣称 running task 零中断。真正的长期目标是继续把调度器和执行器拆开:scheduler 只负责 claim task 并创建 Kubernetes Job/Pod 或独立 worker,runner 把输出、状态、attempt、事件和通知写回 PostgreSQL/OA Event Flow/归档;只有这样 controller/scheduler 滚动更新才不会影响正在执行的任务。 +- 实例语义:D601 是当前唯一 active 执行节点,`code-queue-scheduler` 以一个 scheduler Pod 承载长生命周期 Codex/OpenCode 子进程并轮询主 PostgreSQL 中由 `code-queue-mgr` 写入的 queued/retry_wait 任务。D518 不属于当前 Code Queue k3s 拓扑;在没有原生 k3s-agent 与稳定 Kubernetes 网络前,不得把 D518 写回 `expectedNodeIds` 或恢复 `code-queue-d518` standby。D601 scheduler 默认关闭 `CODE_QUEUE_STARTUP_OA_BACKFILL_ENABLED`;历史 OA Trace/STEP 回填必须通过显式 `/api/oa/backfill` 运维动作触发,不能在每次 Pod 重启时自动批量发布旧事件。 +- 滚动更新边界:master `code-queue-mgr` 保证 D601 抖动或执行面滚动更新期间普通提交、queue 管理和历史读取仍可用;但当前 D601 scheduler Pod 内仍直接承载正在运行的 agent 子进程,scheduler Pod 被替换时 active task 仍会进入 restart-recovery/retry 语义,不能宣称 running task 零中断。真正的长期目标是继续把调度器和执行器拆开:scheduler 只负责 claim task 并创建 Kubernetes Job/Pod 或独立 worker,runner 把输出、状态、attempt、事件和通知写回 PostgreSQL/OA Event Flow/归档;只有这样 controller/scheduler 滚动更新才不会影响正在执行的任务。 - 部署引用:Code Queue 镜像仍复用 `src/components/microservices/code-queue/Dockerfile`,Kubernetes 运行清单为 `src/components/microservices/k3sctl-adapter/k3s/code-queue.k8s.yaml`,`config.json` 对外记录 k3s manifest `src/components/microservices/k3sctl-adapter/k3s/code-queue.k3s.json`;主 server 根目录 `docker-compose.yml` 不包含 `code-queue` service,旧 D601 direct Compose 文件只作为迁移/本地诊断参考,不是正式运行入口。 - 主服务依赖映射:Code Queue 仍以主 PostgreSQL 为权威数据库,但 D601 k3s Pod 不能依赖公网直连 `74.48.78.17:15432/4255`。Pod 内 `DATABASE_URL` 和 `OA_EVENT_FLOW_BASE_URL` 必须指向集群内 `d601-tcp-egress-gateway` Service,再由该 gateway 通过 D601 provider-gateway egress proxy 的 HTTP CONNECT 转发到主 PostgreSQL 和 OA Event Flow;新增 TCP 依赖时扩展 `TCP_EGRESS_ROUTES`,不得在业务容器里新增一次性公网直连或 ad hoc 隧道。D601 active 实例的 `CODE_QUEUE_NOTIFY_CLAUDEQQ_BASE_URL` 必须使用集群内 ClaudeQQ Service `http://claudeqq.unidesk.svc.cluster.local:3290`,并把 `claudeqq`/`claudeqq.unidesk.svc.cluster.local` 加入 `NO_PROXY`,避免任务完成通知被默认出网代理错误转发。旧 `http://host.docker.internal:3290` 只允许作为迁移期诊断,不得作为 Code Queue k3s Pod 的正式通知路径。这些端口映射只服务受控节点运行时,必须用防火墙或等价策略限制来源,不得成为浏览器或任意公网客户端入口。 - K8s 探针与启动维护:Kubernetes liveness/startup probe 必须使用轻量 `/live`,readiness 和用户服务健康使用 `/health`;`/health` 不得执行全量任务聚合、历史回填或长事务索引维护,历史任务总览应由 `/api/tasks/overview` 读取 PostgreSQL。启动时允许后台执行队列元数据 flush、通知 outbox 读取、任务表索引维护和 overview warmup,但这些维护不得阻塞 Bun server、readiness endpoint 或 frontend overview;通知表索引和大批量 OA backfill 不得作为默认启动副作用。 @@ -315,7 +326,8 @@ ClaudeQQ 的业务源码和持久化数据仍在 D601,但正式运行由 k3s - `bun scripts/cli.ts microservice health todo-note` 与 `bun scripts/cli.ts microservice proxy todo-note /api/instances`:验证主 server Todo Note 后端、PostgreSQL 存储和本机 provider-gateway 私有代理链路。 - `bun scripts/cli.ts microservice health oa-event-flow`、`bun scripts/cli.ts microservice proxy oa-event-flow /api/diagnostics --raw` 与 `bun scripts/cli.ts microservice proxy oa-event-flow '/api/events?tags=service:code-queue&limit=20' --raw`:验证统一 OA 事件流、事件表、tag 查询和统计中心。 - `bun scripts/cli.ts microservice health k3sctl-adapter` 与 `bun scripts/cli.ts microservice proxy k3sctl-adapter /api/control-plane --raw`:验证 D601 `unidesk-k3s` 控制面 adapter、manifest、D601 scheduler/read/write 实例状态、`presentNodeIds` 包含 `D601`、`missingNodeIds=[]` 和 no-fallback 运行路径。 -- `bun scripts/cli.ts microservice health code-queue` 与 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`:验证 Code Queue 经过 backend-core -> k3sctl-adapter -> k3s Service proxy 的单一路径,其中 `/health` 指向 `code-queue-scheduler`,overview/详情只读请求指向 `code-queue-read`,写入类请求指向 `code-queue-write`;输出不得出现 `serviceId=code-queue` 的 provider-gateway `microservice.http` 业务代理任务,写入、追加 prompt、打断和 readAt/未读状态都必须由 backend 写入 PostgreSQL,frontend 不得用本地存储伪造成功状态。 +- `bun scripts/cli.ts microservice health code-queue-mgr`:验证主 server 轻量 Code Queue 控制面,输出必须包含 `role=master-control-plane`、`schemaReady=true`、PostgreSQL pool 上限和 `noRunnerDependencies=true`。 +- `bun scripts/cli.ts microservice health code-queue` 与 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`:验证稳定 `code-queue` 用户服务路径可用;普通 health/overview/任务摘要/队列管理默认由 backend-core 分流到主 server `code-queue-mgr`,提交和 readAt/未读状态都必须由后端写入 PostgreSQL,frontend 不得用本地存储伪造成功状态。需要 D601 执行面状态时,通过 `k3sctl-adapter /api/control-plane` 查看 scheduler/read/write ready endpoint,或访问执行面专属 dev-ready、judge、active run control 路径;输出不得出现 `serviceId=code-queue` 的 provider-gateway `microservice.http` 业务代理任务。 - `bun scripts/cli.ts microservice health filebrowser`、`bun scripts/cli.ts microservice health filebrowser-d601` 与 `bun scripts/cli.ts microservice proxy filebrowser / --max-body-bytes 2000`:验证 D518 主 File Browser 和 D601 备用 File Browser 私有代理链路;浏览器 WebUI 必须通过 `/api/microservices/filebrowser/proxy/` 或 `/api/microservices/filebrowser-d601/proxy/` 访问,不得直接开放 `4251` 公网端口。 - `bun scripts/cli.ts --main-server-ip 74.48.78.17 microservice health findjob`:在计算节点或其他非主 server 主机上通过公网 frontend remote CLI 进行同一验证,不需要主 server SSH key。 @@ -335,14 +347,14 @@ ClaudeQQ 的业务源码和持久化数据仍在 D601,但正式运行由 k3s - 在主 server 运行 `bun scripts/cli.ts microservice list`,确认 `pipeline` 的 `providerId=D601`、`public=false`、`frontendOnly=true`、仓库 URL、commit id、`127.0.0.1:18082` 映射和 `pipeline-v2-control` 容器摘要可见。 - 在主 server 运行 `bun scripts/cli.ts microservice list`,确认 `met-nonlinear` 的 `providerId=D601`、`public=false`、`frontendOnly=true`、仓库 URL、commit id、`127.0.0.1:3288` 映射和 `met-nonlinear-ts` 容器摘要可见。 - 在主 server 运行 `bun scripts/cli.ts microservice list`,确认 `claudeqq` 的 `providerId=D601`、`public=false`、`frontendOnly=true`、仓库 URL、commit id、`deployment.mode=k3sctl-managed`、`runtime.orchestrator=k3sctl`、`backend.proxyMode=k3sctl-adapter-http`、`backend.nodeBaseUrl=k3s://claudeqq` 和 `k3s://unidesk/claudeqq:3290` 逻辑 Service 映射可见,且不显示业务容器直连摘要。 -- 在主 server 运行 `bun scripts/cli.ts microservice list`,确认 `k3sctl-adapter` 为 `providerId=D601`、`deployment.mode=unidesk-direct`、后端私有端口 `127.0.0.1:4266`,并确认 `code-queue` 为 `deployment.mode=k3sctl-managed`、`runtime.orchestrator=k3sctl`、`backend.proxyMode=k3sctl-adapter-http`、`backend.nodeBaseUrl=k3s://code-queue`,且不再显示业务容器直连摘要。 +- 在主 server 运行 `bun scripts/cli.ts microservice list`,确认 `code-queue-mgr` 为 `providerId=main-server`、`deployment.mode=internal-sidecar`、Compose 后端 `http://code-queue-mgr:4278`、`frontend.integrated=false`,并确认稳定 `code-queue` 对外 ID 的说明中体现控制/读取默认由 `code-queue-mgr` 承担、D601 `k3sctl-managed` 只承担执行面;`k3sctl-adapter` 仍为 `providerId=D601`、`deployment.mode=unidesk-direct`、后端私有端口 `127.0.0.1:4266`。 - 在主 server 运行 `bun scripts/cli.ts microservice list`,确认 `filebrowser` 和 `filebrowser-d601` 分别显示为 `providerId=D518` 和 `providerId=D601`,均为 `public=false`、`frontendOnly=true`,仓库 URL 为 `https://github.com/filebrowser/filebrowser`,后端映射为 `host.docker.internal:4251`,容器摘要分别为 `unidesk-filebrowser-d518` 和 `unidesk-filebrowser-d601`;列表中不得再出现主 server `filebrowser-main` 容器。 - 运行 `bun scripts/cli.ts microservice health findjob` 与 `bun scripts/cli.ts microservice proxy findjob /api/summary`,确认真实链路经过 backend-core、WebSocket、D601 provider-gateway 和 D601 本机 FindJob 后端。 - 运行 `bun scripts/cli.ts microservice health pipeline` 与 `bun scripts/cli.ts microservice proxy pipeline '/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3'`,确认真实链路经过 backend-core、WebSocket、D601 provider-gateway 和 D601 本机 Pipeline 后端,且 run/procedure 摘要包含甘特图所需时间字段。 - 运行 `bun scripts/cli.ts microservice health met-nonlinear`、`bun scripts/cli.ts microservice proxy met-nonlinear /api/queue`、`bun scripts/cli.ts microservice proxy met-nonlinear '/api/projects?root=projects&limit=20'` 和 `bun scripts/cli.ts microservice proxy met-nonlinear /api/images`,确认真实链路经过 backend-core、WebSocket、D601 provider-gateway 和 D601 本机 MET Nonlinear TS 后端。 - 运行 `bun scripts/cli.ts microservice health claudeqq`、`bun scripts/cli.ts microservice proxy claudeqq /api/napcat/login`、`bun scripts/cli.ts microservice proxy claudeqq /api/events/recent` 和 `bun scripts/cli.ts microservice proxy claudeqq /api/events/subscriptions`,确认真实链路经过 backend-core、k3sctl-adapter、Kubernetes API service proxy 和 D601 Kubernetes Service `claudeqq:3290`;health 应显示 `service=claudeqq`、`pureBackend=true`、`napcat.containerized=true`、NapCat HTTP/WS 状态、二维码状态和订阅计数。 - 运行 `bun scripts/cli.ts microservice health todo-note` 与 `bun scripts/cli.ts microservice proxy todo-note /api/instances`,确认真实链路经过 backend-core、WebSocket、main-server provider-gateway 和主 server `todo-note-backend` 后端;输出中必须包含五个迁移清单和 PostgreSQL 存储健康状态。 -- 运行 `bun scripts/cli.ts microservice health k3sctl-adapter`、`bun scripts/cli.ts microservice proxy k3sctl-adapter /api/control-plane --raw`、`bun scripts/cli.ts microservice health code-queue` 与 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`,确认真实链路经过 backend-core -> k3sctl-adapter -> k3s Service proxy,且 read/write/scheduler 三个内部 Service 都有 ready endpoint;adapter 验收还必须证明其作为 UniDesk 直管服务运行在 k3s 外部,Docker 形态下挂载宿主 `/etc/rancher/k3s/k3s.yaml` 与 `/run/host-ssh/id_ed25519`,通过容器内 SSH local tunnel 连接 WSL 原生 k3s API,且 D601/D518 上都没有 active `rancher/k3s` 容器。Code Queue `/health` 必须仍返回业务后端自己的 `role=scheduler`、`queue.storage.primary=postgres`、`queue.storage.postgresReady=true`、`queue.notifications.claudeqq.outbox.storage=postgres` 和 `egressProxy.connected=true`,不得被 adapter 聚合健康 JSON 替代。还必须在 active Code Queue Pod 内验证主 PostgreSQL 端口映射、主 OA Event Flow 端口映射、集群内 ClaudeQQ `http://claudeqq.unidesk.svc.cluster.local:3290/health` 和 `d601-provider-egress-proxy` 均可访问,并确认 `/workspace` 与 `/home/ubuntu` 指向同一 WSL home hostPath,`/workspace/cq-deploy` 这类绝对 symlink 可以进入真实目录。再在 adapter 控制页确认 D601 scheduler serving healthy、D601 read/write Service healthy、`presentNodeIds` 包含 `D601`、`missingNodeIds=[]` 且整体不退化为 hidden fallback。再通过公网 frontend 提交一个 `gpt-5.5` 小任务,确认任务由 write 入库、scheduler 轮询执行、read 返回输出实时更新、结束后有 judge 判定,且运行中可追加 prompt 或打断。Code Queue 的重启恢复必须作为验收项:运行中任务存在时重启或重建 scheduler 实例后,任务必须从 PostgreSQL 恢复到可继续执行状态,不能丢失 active task、`promptHistory`、后续 queued 任务、readAt/未读状态或已入 outbox 的 ClaudeQQ 通知。Code Queue 服务名、表名前缀或持久化目录发生迁移后,还必须运行 `bun scripts/cli.ts e2e run --only microservice:catalog-code-queue,microservice:code-queue-status,microservice:code-queue-health,microservice:code-queue-tasks`,证明 backend-core catalog、k3s adapter 私有代理、PostgreSQL 队列和任务列表都指向 `code-queue`。批量验收必须通过公网 frontend 设置 `入队份数=5` 或使用多段 prompt 分隔,一次性入队 5 条任务,并确认 5 条任务按顺序进入 running/judging/succeeded,而不是只运行第一条。 +- 运行 `bun scripts/cli.ts microservice health code-queue-mgr`、`bun scripts/cli.ts microservice health code-queue`、`bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview` 和 `bun scripts/cli.ts codex submit --dry-run ...`,确认稳定 `code-queue` 控制/读取路径经 backend-core 分流到主 server `code-queue-mgr`,不依赖 D601 `code-queue-write` ready endpoint。再运行 `bun scripts/cli.ts microservice health k3sctl-adapter` 与 `bun scripts/cli.ts microservice proxy k3sctl-adapter /api/control-plane --raw`,确认 D601 scheduler/read/write 三个内部 Service 的 ready endpoint 和 no-fallback 拓扑;adapter 验收还必须证明其作为 UniDesk 直管服务运行在 k3s 外部,Docker 形态下挂载宿主 `/etc/rancher/k3s/k3s.yaml` 与 `/run/host-ssh/id_ed25519`,通过容器内 SSH local tunnel 连接 WSL 原生 k3s API,且 D601/D518 上都没有 active `rancher/k3s` 容器。D601 scheduler `/health` 必须仍返回业务后端自己的 `role=scheduler`、`queue.storage.primary=postgres`、`queue.storage.postgresReady=true`、`queue.notifications.claudeqq.outbox.storage=postgres` 和 `egressProxy.connected=true`,不得被 adapter 聚合健康 JSON 替代。还必须在 active Code Queue Pod 内验证主 PostgreSQL 端口映射、主 OA Event Flow 端口映射、集群内 ClaudeQQ `http://claudeqq.unidesk.svc.cluster.local:3290/health` 和 `d601-provider-egress-proxy` 均可访问,并确认 `/workspace` 与 `/home/ubuntu` 指向同一 WSL home hostPath,`/workspace/cq-deploy` 这类绝对 symlink 可以进入真实目录。再通过公网 frontend 提交一个 `gpt-5.5` 小任务,确认任务先由 master `code-queue-mgr` 入库、D601 scheduler 轮询执行、输出实时更新、结束后有 judge 判定,且运行中可追加 prompt 或打断。Code Queue 的重启恢复必须作为验收项:运行中任务存在时重启或重建 scheduler 实例后,任务必须从 PostgreSQL 恢复到可继续执行状态,不能丢失 active task、`promptHistory`、后续 queued 任务、readAt/未读状态或已入 outbox 的 ClaudeQQ 通知。Code Queue 服务名、表名前缀或持久化目录发生迁移后,还必须运行 `bun scripts/cli.ts e2e run --only microservice:catalog-code-queue,microservice:code-queue-status,microservice:code-queue-health,microservice:code-queue-tasks`,证明 backend-core catalog、master mgr、k3s adapter 执行面、PostgreSQL 队列和任务列表都指向稳定 `code-queue`。批量验收必须通过公网 frontend 设置 `入队份数=5` 或使用多段 prompt 分隔,一次性入队 5 条任务,并确认 5 条任务按顺序进入 running/judging/succeeded,而不是只运行第一条。 - Code Queue 内存防回归验收:凡是改动 Code Queue 的持久化、scheduler、输出/Trace、health、列表/详情查询、日志导出或容器运行参数,交付前必须在 D601 用 `kubectl -n unidesk get deploy,pod,svc,endpoints -o wide`、`kubectl -n unidesk describe deploy/code-queue` 或等价 Docker inspect 确认 memory/swap 硬上限符合预算,运行 `kubectl -n unidesk top pod` 或 Docker stats 确认常驻内存、`OOMKilled=false` 和 `RestartCount` 未异常增长,再运行 `bun scripts/cli.ts microservice health code-queue` 确认 `/health` 是轻量 readiness 且暴露 PostgreSQL/notification/outbox 状态。验收还必须覆盖有历史任务存在时的 `/api/tasks/overview`、单任务详情和 output/transcript 查询,证明热状态裁剪不会丢历史输出、也不会重新把全部历史 `task_json` 缓存在进程内;涉及 TypeScript/frontend 验证的任务应能在 D601 Code Queue memory/swap 预算中完成 `bun run --cwd src/components/frontend check` 这类短时高内存命令,而不是被 memory watchdog 反复 SIGTERM。 - Code Queue 延迟防回归验收:凡是改动 Code Queue 列表、overview、readAt、Trace/summary 懒加载、实时 output/SSE 事件发布、frontend 请求策略、backend-core 用户服务代理或 frontend Code Queue 请求路径,交付前必须在有历史任务数据且有 active output 流动的 live 环境验证 `GET /api/tasks/overview`、`POST /api/tasks//read`、选定 task 的 `trace-step` 和前端 `/app/code-queue/` 首屏均低于 1s 目标;可运行 `bun scripts/src/code-queue-perf.ts --json --target-ms 1000` 采集公网 frontend 下的首屏耗时、最慢 API 和 DOM 完成指标,并用 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview --raw`、D601 Pod `/health` 与 `/api/tasks/overview` curl、性能面板 `/api/performance` 与 `/api/frontend-performance` 失败/慢操作记录、`kubectl -n unidesk top pod` 或 Docker stats 补充后端耗时、代理 502 和内存/CPU 证据。验收结论必须同时说明是否使用了短 TTL cache、cache 如何被 mutation 或 archive append 失效、数据库索引/聚合是否命中、输出热路径是否只读增量指标,以及分页加载是否跳过 selected/active/stats;不能只展示 cache 命中后的单次快照。 - 运行 `bun scripts/cli.ts microservice health filebrowser`、`bun scripts/cli.ts microservice health filebrowser-d601` 和 `bun scripts/cli.ts microservice proxy filebrowser / --max-body-bytes 2000`,确认 File Browser health 返回 `status=OK`,WebUI HTML 包含 `File Browser`,D518/D601 通过 provider-gateway 访问节点本机 `4251`;随后在公网 frontend 的 `用户服务 / File Browser` 中确认 D518 为默认目标、可导出截图、iframe 紧凑布局不再有巨大 `folder` 标记遮挡文件名,并可浏览 `/mnt/c`。 diff --git a/docs/reference/repo-tree.md b/docs/reference/repo-tree.md index 97ffcf5f..316a8aeb 100644 --- a/docs/reference/repo-tree.md +++ b/docs/reference/repo-tree.md @@ -86,7 +86,8 @@ - config/postgresql.conf - init/001_unidesk_init.sql - microservices/ (UniDesk-owned user services and compatibility examples) - - code-queue/ (Codex/OpenCode queue backend; k3s-managed when exposed through UniDesk) + - code-queue/ (Codex/OpenCode execution plane backend; k3s-managed on D601) + - code-queue-mgr/ (Lightweight master-side Code Queue control plane; PostgreSQL CRUD/read path only) - oa-event-flow/ (Unified OA event ledger, tag stream, and Trace/STEP stats center) - decision-center/ (Decision records backend; k3s-managed on D601 and PostgreSQL-backed) - k3sctl-adapter/ (D601 k3s control-plane adapter and managed service manifests) diff --git a/scripts/cli.ts b/scripts/cli.ts index ac047901..91c914b0 100644 --- a/scripts/cli.ts +++ b/scripts/cli.ts @@ -32,7 +32,7 @@ function help(): unknown { { command: "server stop", description: "Fire-and-forget docker-compose down for the fixed UniDesk stack." }, { command: "server status", description: "Show fixed ports, containers, service health, and public URLs." }, { command: "server logs [--tail-bytes N]", description: "Return bounded tails from file logs and docker logs." }, - { command: "server rebuild ", description: "Build first, then serialize, force-recreate, and validate one Compose service." }, + { command: "server rebuild ", description: "Build first, then serialize, force-recreate, and validate one Compose service." }, { command: "provider attach [--master-server URL] [--up] [--force]", description: "Generate the minimal external provider-gateway env/compose bundle; only master server URL and provider id are required." }, { command: "ssh [ssh-like args...]", description: "Open a Host SSH / WSL SSH maintenance session through the provider-gateway bridge with built-in remote helper tools in PATH." }, { command: "ssh apply-patch [tool args...] < patch.diff", description: "Invoke the injected remote apply_patch helper directly over SSH passthrough and stream the patch from local stdin." }, @@ -176,7 +176,7 @@ async function main(): Promise { } if (sub === "rebuild") { if (!isRebuildableService(third)) { - throw new Error("server rebuild requires one of: backend-core, frontend, provider-gateway, todo-note, code-queue, project-manager, baidu-netdisk, oa-event-flow"); + throw new Error("server rebuild requires one of: backend-core, frontend, provider-gateway, todo-note, code-queue-mgr, project-manager, baidu-netdisk, oa-event-flow"); } emitJson(commandName, rebuildService(config, third)); return; diff --git a/scripts/src/check.ts b/scripts/src/check.ts index c7d5fba6..49a62bf8 100644 --- a/scripts/src/check.ts +++ b/scripts/src/check.ts @@ -29,6 +29,7 @@ const syntaxFiles = [ "src/components/microservices/k3sctl-adapter/src/index.ts", "src/components/microservices/mdtodo/src/index.ts", "src/components/microservices/decision-center/src/index.ts", + "src/components/microservices/code-queue-mgr/src/index.ts", ]; export interface CheckOptions { @@ -141,6 +142,7 @@ function unifiedLogRotationItem(): CheckItem { "src/components/microservices/baidu-netdisk/src/index.ts", "src/components/microservices/oa-event-flow/src/index.ts", "src/components/microservices/decision-center/src/index.ts", + "src/components/microservices/code-queue-mgr/src/index.ts", ]; const offenders = serviceFiles.flatMap((path) => { const text = readFileSync(rootPath(path), "utf8"); @@ -182,6 +184,7 @@ export function runChecks(config: UniDeskConfig, options: CheckOptions = default fileItem("src/components/microservices/k3sctl-adapter/src/index.ts"), fileItem("src/components/microservices/mdtodo/src/index.ts"), fileItem("src/components/microservices/decision-center/src/index.ts"), + fileItem("src/components/microservices/code-queue-mgr/src/index.ts"), fileItem("scripts/src/deploy.ts"), fileItem("scripts/src/e2e.ts"), ); diff --git a/scripts/src/config.ts b/scripts/src/config.ts index 519d932c..3874eda4 100644 --- a/scripts/src/config.ts +++ b/scripts/src/config.ts @@ -67,7 +67,7 @@ export interface UniDeskMicroserviceConfig { timeoutMs: number; }; deployment: { - mode: "unidesk-direct" | "k3sctl-managed"; + mode: "unidesk-direct" | "k3sctl-managed" | "internal-sidecar"; adapterServiceId?: string; k3sServiceId?: string; namespace?: string; @@ -169,8 +169,8 @@ function microserviceConfig(item: Record, index: number): UniDe const frontend = asRecord(item.frontend, `${path}.frontend`); const deployment = item.deployment === undefined ? undefined : asRecord(item.deployment, `${path}.deployment`); const deploymentMode = deployment === undefined ? "unidesk-direct" : stringField(deployment, "mode", `${path}.deployment`); - if (deploymentMode !== "unidesk-direct" && deploymentMode !== "k3sctl-managed") { - throw new Error(`${path}.deployment.mode must be unidesk-direct or k3sctl-managed`); + if (deploymentMode !== "unidesk-direct" && deploymentMode !== "k3sctl-managed" && deploymentMode !== "internal-sidecar") { + throw new Error(`${path}.deployment.mode must be unidesk-direct, k3sctl-managed, or internal-sidecar`); } return { id: stringField(item, "id", path), diff --git a/scripts/src/docker.ts b/scripts/src/docker.ts index 4c71ac48..95449cc7 100644 --- a/scripts/src/docker.ts +++ b/scripts/src/docker.ts @@ -19,7 +19,7 @@ export interface ContainerStatus { ports: string; } -const rebuildableServices = ["backend-core", "frontend", "provider-gateway", "todo-note", "project-manager", "baidu-netdisk", "oa-event-flow"] as const; +const rebuildableServices = ["backend-core", "frontend", "provider-gateway", "todo-note", "code-queue-mgr", "project-manager", "baidu-netdisk", "oa-event-flow"] as const; export type RebuildableService = typeof rebuildableServices[number]; export function isRebuildableService(value: string | undefined): value is RebuildableService { @@ -143,6 +143,8 @@ export function writeComposeEnv(config: UniDeskConfig, freshLogPrefix: boolean): UNIDESK_TODO_NOTE_REMINDER_SCAN_INTERVAL_MS: runtimeSecret("UNIDESK_TODO_NOTE_REMINDER_SCAN_INTERVAL_MS") || "30000", UNIDESK_TODO_NOTE_REMINDER_CLAUDEQQ_TIMEOUT_MS: runtimeSecret("UNIDESK_TODO_NOTE_REMINDER_CLAUDEQQ_TIMEOUT_MS") || "15000", UNIDESK_TODO_NOTE_REMINDER_CLAUDEQQ_SEND_ATTEMPTS: runtimeSecret("UNIDESK_TODO_NOTE_REMINDER_CLAUDEQQ_SEND_ATTEMPTS") || "3", + UNIDESK_CODE_QUEUE_MGR_DATABASE_POOL_MAX: runtimeSecret("UNIDESK_CODE_QUEUE_MGR_DATABASE_POOL_MAX") || "2", + UNIDESK_CODE_QUEUE_TRACE_DATABASE_POOL_MAX: runtimeSecret("UNIDESK_CODE_QUEUE_TRACE_DATABASE_POOL_MAX") || "1", UNIDESK_CODE_QUEUE_MINIMAX_API_KEY: runtimeSecret("UNIDESK_CODE_QUEUE_MINIMAX_API_KEY") || runtimeSecret("MINIMAX_API_KEY"), UNIDESK_CODE_QUEUE_MINIMAX_MODEL: runtimeSecret("UNIDESK_CODE_QUEUE_MINIMAX_MODEL") || runtimeSecret("MINIMAX_MODEL") || "MiniMax-M2.7", UNIDESK_CODE_QUEUE_MINIMAX_API_BASE: runtimeSecret("UNIDESK_CODE_QUEUE_MINIMAX_API_BASE") || runtimeSecret("MINIMAX_API_BASE") || "https://api.minimaxi.com/v1", @@ -232,11 +234,10 @@ export function rebuildService(config: UniDeskConfig, service: RebuildableServic const watchdogLog = rootPath(".state", "jobs", "compose-rebuild-watchdog.log"); const watchdogInnerScript = [ "set -euo pipefail", - "sleep 20", `cid=$(${shellJoin(listServiceContainersCommand)} || true)`, `if [ -z "$cid" ]; then echo "$(date -Is) compose_rebuild_watchdog_restore service=${service}" >> ${shellQuote(watchdogLog)}; ${shellJoin(restoreCommand)} >> ${shellQuote(watchdogLog)} 2>&1 || true; fi`, ].join("\n"); - const watchdogScript = `set -euo pipefail; ${shellJoin(["flock", "-w", "300", lockPath, "bash", "-lc", watchdogInnerScript])} || true`; + const watchdogScript = `set -euo pipefail; sleep 20; ${shellJoin(["flock", "-w", "300", lockPath, "bash", "-lc", watchdogInnerScript])} || true`; const validateScript = [ "ready=0", "for attempt in $(seq 1 60); do", @@ -339,7 +340,9 @@ function composeLockedScript(innerScript: string): string { "set -euo pipefail", `mkdir -p ${shellQuote(rootPath(".state", "locks"))}`, `echo ${shellJoin(["compose_lock_wait", lockPath])}`, - shellJoin(["flock", lockPath, "bash", "-lc", innerScript]), + // Prevent background helpers spawned by the locked script from inheriting + // the compose lock fd and blocking later server rebuild jobs. + shellJoin(["flock", "-o", lockPath, "bash", "-lc", innerScript]), ].join("; "); } diff --git a/src/components/backend-core/src/config.ts b/src/components/backend-core/src/config.ts index a8627265..21662f05 100644 --- a/src/components/backend-core/src/config.ts +++ b/src/components/backend-core/src/config.ts @@ -80,8 +80,8 @@ function parseMicroserviceConfig(value: unknown, index: number): MicroserviceCon const frontend = asRecord(item.frontend, `${path}.frontend`); const deployment = item.deployment === undefined ? undefined : asRecord(item.deployment, `${path}.deployment`); const deploymentMode = deployment === undefined ? "unidesk-direct" : stringFromRecord(deployment, "mode", `${path}.deployment`); - if (deploymentMode !== "unidesk-direct" && deploymentMode !== "k3sctl-managed") { - throw new Error(`${path}.deployment.mode must be unidesk-direct or k3sctl-managed`); + if (deploymentMode !== "unidesk-direct" && deploymentMode !== "k3sctl-managed" && deploymentMode !== "internal-sidecar") { + throw new Error(`${path}.deployment.mode must be unidesk-direct, k3sctl-managed, or internal-sidecar`); } return { id: stringFromRecord(item, "id", path), diff --git a/src/components/backend-core/src/microservice-proxy.ts b/src/components/backend-core/src/microservice-proxy.ts index ee4951e4..c34b65f7 100644 --- a/src/components/backend-core/src/microservice-proxy.ts +++ b/src/components/backend-core/src/microservice-proxy.ts @@ -390,6 +390,17 @@ function codeQueueK3sServiceIdForRequest(method: string, targetPath: string): st return "code-queue-write"; } +function codeQueueMasterControlPath(method: string, targetPath: string): boolean { + const normalizedMethod = method.toUpperCase(); + if (targetPath === "/" || targetPath === "/health" || targetPath === "/live" || targetPath === "/logs") return true; + if (targetPath === "/api/queues" || targetPath === "/api/queues/merge") return true; + if (/^\/api\/queues\/[^/]+(?:\/merge)?$/u.test(targetPath)) return true; + if (targetPath === "/api/tasks" || targetPath === "/api/tasks/batch" || targetPath === "/api/tasks/overview" || targetPath === "/api/tasks/stats" || targetPath === "/api/tasks/read-all") return true; + if (/^\/api\/tasks\/[^/]+\/(?:summary|trace-summary|trace-steps|trace-step|transcript|output|prompt|read|retry|move|edit)$/u.test(targetPath)) return true; + if (/^\/api\/tasks\/[^/]+$/u.test(targetPath) && normalizedMethod === "GET") return true; + return false; +} + // --------------------------------------------------------------------------- // Cache helpers // --------------------------------------------------------------------------- @@ -713,6 +724,11 @@ async function fetchMicroserviceUpstreamResponse( bodyText: string, abortSignal?: AbortSignal, ): Promise { + if (service.id === "code-queue" && codeQueueMasterControlPath(method, targetPath)) { + const mgr = microserviceById("code-queue-mgr"); + if (mgr !== null) return directMicroserviceResponse(mgr, method, targetPath, proxyOptions, requestHeaders, bodyText, abortSignal); + logger("warn", "code_queue_mgr_missing_fallback_to_d601", { method, targetPath }); + } if (isK3sctlManagedMicroservice(service)) { return k3sctlAdapterMicroserviceResponse(service, method, targetPath, proxyOptions, requestHeaders, bodyText, abortSignal); } diff --git a/src/components/backend-core/src/types.ts b/src/components/backend-core/src/types.ts index 681160b9..01aece8b 100644 --- a/src/components/backend-core/src/types.ts +++ b/src/components/backend-core/src/types.ts @@ -45,7 +45,7 @@ export interface MicroserviceConfig { timeoutMs: number; }; deployment: { - mode: "unidesk-direct" | "k3sctl-managed"; + mode: "unidesk-direct" | "k3sctl-managed" | "internal-sidecar"; adapterServiceId?: string; k3sServiceId?: string; namespace?: string; diff --git a/src/components/microservices/code-queue-mgr/Dockerfile b/src/components/microservices/code-queue-mgr/Dockerfile new file mode 100644 index 00000000..c59b1dfa --- /dev/null +++ b/src/components/microservices/code-queue-mgr/Dockerfile @@ -0,0 +1,12 @@ +ARG CODE_QUEUE_MGR_BASE_IMAGE=oven/bun:1-debian +FROM ${CODE_QUEUE_MGR_BASE_IMAGE} + +WORKDIR /app/src/components/microservices/code-queue-mgr +COPY src/components/microservices/code-queue-mgr/package.json ./package.json +RUN bun install --production +COPY src/components/microservices/code-queue-mgr/tsconfig.json ./tsconfig.json +COPY src/components/shared /app/src/components/shared +COPY src/components/microservices/code-queue-mgr/src ./src + +EXPOSE 4278 +CMD ["bun", "--smol", "run", "src/index.ts"] diff --git a/src/components/microservices/code-queue-mgr/bun.lock b/src/components/microservices/code-queue-mgr/bun.lock new file mode 100644 index 00000000..46f750bc --- /dev/null +++ b/src/components/microservices/code-queue-mgr/bun.lock @@ -0,0 +1,34 @@ +{ + "lockfileVersion": 1, + "configVersion": 1, + "workspaces": { + "": { + "name": "@unidesk/code-queue-mgr", + "dependencies": { + "postgres": "3.4.9", + }, + "devDependencies": { + "@types/bun": "1.3.14", + "@types/node": "24.10.1", + "typescript": "6.0.3", + }, + }, + }, + "packages": { + "@types/bun": ["@types/bun@1.3.14", "", { "dependencies": { "bun-types": "1.3.14" } }, "sha512-h1hFqFVcvAvD9j9K7ZW7vd82aSA+rTdznZa+5bwvCwqSB1jmmfLcbIWhOLx1/+boy/xmjgCs/OMUL8hRJSmnPw=="], + + "@types/node": ["@types/node@24.10.1", "", { "dependencies": { "undici-types": "~7.16.0" } }, "sha512-GNWcUTRBgIRJD5zj+Tq0fKOJ5XZajIiBroOF0yvj2bSU1WvNdYS/dn9UxwsujGW4JX06dnHyjV2y9rRaybH0iQ=="], + + "bun-types": ["bun-types@1.3.14", "", { "dependencies": { "@types/node": "*" } }, "sha512-4N0ig0fEomHt5R0KCFWjovxow98rIoRwKolrYdCcknNwMekCXRnWEUvgu5soYV8QXtVsrUD8B95MBOZGPvr6KQ=="], + + "postgres": ["postgres@3.4.9", "", {}, "sha512-GD3qdB0x1z9xgFI6cdRD6xu2Sp2WCOEoe3mtnyB5Ee0XrrL5Pe+e4CCnJrRMnL1zYtRDZmQQVbvOttLnKDLnaw=="], + + "typescript": ["typescript@6.0.3", "", { "bin": { "tsc": "bin/tsc", "tsserver": "bin/tsserver" } }, "sha512-y2TvuxSZPDyQakkFRPZHKFm+KKVqIisdg9/CZwm9ftvKXLP8NRWj38/ODjNbr43SsoXqNuAisEf1GdCxqWcdBw=="], + + "undici-types": ["undici-types@7.16.0", "", {}, "sha512-Zz+aZWSj8LE6zoxD+xrjh4VfkIG8Ya6LvYkZqtUQGJPZjYl53ypCaUwWqo7eI0x66KBGeRo+mlBEkMSeSZ38Nw=="], + + "bun-types/@types/node": ["@types/node@25.8.0", "", { "dependencies": { "undici-types": ">=7.24.0 <7.24.7" } }, "sha512-TCFSk8IZh+iLX1xtksoBVtdmgL+1IX0fC9BeU4QqFSuNdN/K+HUlhqOzEmSYYpZUVsLYcPqc9KX+60iDuninSQ=="], + + "bun-types/@types/node/undici-types": ["undici-types@7.24.6", "", {}, "sha512-WRNW+sJgj5OBN4/0JpHFqtqzhpbnV0GuB+OozA9gCL7a993SmU+1JBZCzLNxYsbMfIeDL+lTsphD5jN5N+n0zg=="], + } +} diff --git a/src/components/microservices/code-queue-mgr/package.json b/src/components/microservices/code-queue-mgr/package.json new file mode 100644 index 00000000..aae8218c --- /dev/null +++ b/src/components/microservices/code-queue-mgr/package.json @@ -0,0 +1,17 @@ +{ + "name": "@unidesk/code-queue-mgr", + "private": true, + "type": "module", + "scripts": { + "start": "bun run src/index.ts", + "check": "tsc -p tsconfig.json --noEmit" + }, + "dependencies": { + "postgres": "3.4.9" + }, + "devDependencies": { + "@types/bun": "1.3.14", + "@types/node": "24.10.1", + "typescript": "6.0.3" + } +} diff --git a/src/components/microservices/code-queue-mgr/src/index.ts b/src/components/microservices/code-queue-mgr/src/index.ts new file mode 100644 index 00000000..27a6a352 --- /dev/null +++ b/src/components/microservices/code-queue-mgr/src/index.ts @@ -0,0 +1,2157 @@ +import postgres from "postgres"; +import { createHourlyJsonlWriter, logRetentionBytesForService } from "../../../shared/src/rotating-jsonl"; + +type JsonValue = string | number | boolean | null | JsonValue[] | { [key: string]: JsonValue }; +type JsonRecord = Record; +type SqlExecutor = postgres.Sql | postgres.TransactionSql; +type TaskStatus = "queued" | "running" | "judging" | "retry_wait" | "succeeded" | "failed" | "canceled"; +type CodeExecutionMode = "default" | "windows-native"; +type RunMode = "initial" | "retry"; +type OutputChannel = "system" | "user" | "assistant" | "reasoning" | "command" | "diff" | "tool" | "error"; + +interface RuntimeConfig { + host: string; + port: number; + databaseUrl: string; + logFile: string; + mgrDatabasePoolMax: number; + traceDatabasePoolMax: number; + defaultProviderId: string; + defaultWorkdir: string; + remoteDefaultWorkdir: string; + defaultModel: string; + codeModels: string[]; + defaultMaxAttempts: number; + defaultReasoningEffort: string | null; + modelReasoningEfforts: Record; + defaultQueueId: string; + maxListLimit: number; +} + +interface QueueRecord { + id: string; + name: string; + createdAt: string; + updatedAt: string; +} + +interface DailyTaskStatsBucket { + date: string; + executedTasks: number; + completedTasks: number; + retryAttempts: number; + succeededTasks: number; + failedTasks: number; + canceledTasks: number; + totalDurationMs: number; + durationSamples: number; +} + +interface ReferenceInjectionSummaryItem { + round: number; + roundIndex: number; + taskId: string; + viaTaskId: string | null; + status: TaskStatus; + providerId: string; + executionMode?: CodeExecutionMode; + model: string; + cwd: string; + createdAt: string; + updatedAt: string; + promptChars: number; + finalResponseChars: number; + finalResponseAt: string | null; + finalResponseSource: string; + referenceTaskIds: string[]; + cliHint: string; +} + +interface ReferenceInjectionRecord { + version: 2; + injectedAt: string; + basePrompt: string; + directReferenceTaskIds: string[]; + maxRounds: number | null; + truncated: boolean; + itemCount: number; + items: ReferenceInjectionSummaryItem[]; +} + +interface PreparedPrompt { + prompt: string; + basePrompt: string; + referenceTaskIds: string[]; + referenceInjection: JsonValue | null; +} + +interface LiveOutput { + seq: number; + at: string; + channel: OutputChannel; + text: string; + method?: string; + itemId?: string; +} + +interface AttemptSummary { + index: number; + mode: RunMode; + startedAt: string; + finishedAt: string; + providerId?: string; + executionMode?: CodeExecutionMode; + terminalStatus: "completed" | "interrupted" | "failed" | null; + transportClosedBeforeTerminal: boolean; + appServerExitCode: number | null; + appServerSignal: string | null; + error: string | null; + events?: JsonValue[]; + inputPrompt?: string; + inputPromptPreview?: string; + inputPromptChars?: number; + inputPromptLines?: number; + finalResponse?: string; + finalResponsePreview: string; + finalResponseChars?: number; + judge?: JsonValue; + judgeAt?: string | null; + judgeSeq?: number | null; + feedbackPrompt?: string; + feedbackPromptPreview?: string; + feedbackPromptChars?: number; + feedbackPromptLines?: number; + feedbackPromptSource?: string; + feedbackPromptForAttempt?: number | null; + stderrTail: string; + outputStartSeq?: number | null; + outputEndSeq?: number | null; + errorCount?: number; +} + +interface QueueTask { + id: string; + queueId: string; + queueEnteredAt: string; + prompt: string; + basePrompt: string; + referenceTaskIds: string[]; + referenceInjection: JsonValue | null; + providerId: string; + cwd: string; + model: string; + reasoningEffort: string | null; + executionMode: CodeExecutionMode; + maxAttempts: number; + status: TaskStatus; + createdAt: string; + updatedAt: string; + startedAt: string | null; + finishedAt: string | null; + readAt: string | null; + currentAttempt: number; + currentMode: RunMode | null; + codexThreadId: string | null; + activeTurnId: string | null; + finalResponse: string; + stepCount?: number; + llmStepCount?: number; + outputMaxSeq?: number; + lastError: string | null; + lastJudge: JsonValue | null; + judgeFailCount: number; + promptHistory: JsonValue[]; + output: LiveOutput[]; + events: JsonValue[]; + attempts: AttemptSummary[]; + cancelRequested: boolean; + nextPrompt: string | null; + nextMode: RunMode | null; +} + +interface TaskRow { + id: string; + queue_id: string; + status: TaskStatus; + provider_id: string; + execution_mode: CodeExecutionMode; + model: string; + cwd: string; + prompt: string; + base_prompt: string; + reference_task_ids: JsonValue; + reference_injection: JsonValue | null; + reasoning_effort: string | null; + max_attempts: number | string; + current_attempt: number | string; + current_mode: RunMode | null; + codex_thread_id: string | null; + active_turn_id: string | null; + created_at: Date | string; + updated_at: Date | string; + started_at: Date | string | null; + finished_at: Date | string | null; + read_at: Date | string | null; + last_error: string | null; + last_judge: JsonValue | null; + output_count: number | string; + event_count: number | string; + attempt_count: number | string; + last_output_seq: number | string; + task_json: unknown; +} + +interface QueueRow { + id: string; + name: string; + created_at: Date | string; + updated_at: Date | string; +} + +interface HttpErrorDetail { + status: number; + body: JsonRecord; +} + +const serviceStartedAt = new Date().toISOString(); +const recentLogs: JsonRecord[] = []; +const defaultQueueId = "default"; +const queueIdPattern = /^[A-Za-z0-9][A-Za-z0-9_.-]{0,63}$/u; +const codexTaskIdPattern = /^codex_\d+_[A-Za-z0-9_-]+$/u; +const referenceInjectionMaxRounds: number | null = null; +const resolvedReferenceContextTitle = "# Code Queue 已解析引用上下文"; +const currentTaskPromptMarker = "\n# 本次任务\n"; +const codeQueueEnvironmentHintTitle = "# Code Queue 运行环境提示"; +const codeQueueEnvironmentHint = [ + codeQueueEnvironmentHintTitle, + "如果当前 Code Queue Docker 容器缺少完成任务所需的环境、系统包或语言依赖,可以先在容器内临时安装以推进当前任务;同时必须把该依赖补到 `src/components/microservices/code-queue/Dockerfile`,让后续任务重建镜像后可直接使用。", +].join("\n"); +const maxTaskAttempts = 99; +const defaultCodeModels = ["gpt-5.5", "gpt-5.4-mini", "gpt-5.4", "minimax-m2.7"]; +const codeExecutionModes: CodeExecutionMode[] = ["default", "windows-native"]; +const codexStatsTimeZone = "Asia/Shanghai"; +const codexStatsDateFormatter = new Intl.DateTimeFormat("en-CA", { + timeZone: codexStatsTimeZone, + year: "numeric", + month: "2-digit", + day: "2-digit", +}); + +let schemaReady = false; +let schemaLastError: JsonRecord | null = null; +let lastDatabaseError: string | null = null; + +function envString(name: string, fallback: string): string { + const value = process.env[name]; + return value === undefined || value.length === 0 ? fallback : value; +} + +function envNumber(name: string, fallback: number): number { + const raw = process.env[name]; + if (raw === undefined || raw.length === 0) return fallback; + const value = Number(raw); + return Number.isFinite(value) && value > 0 ? Math.floor(value) : fallback; +} + +function envList(name: string, fallback: string[]): string[] { + const raw = process.env[name]; + const source = raw === undefined || raw.length === 0 ? fallback.join(",") : raw; + return Array.from(new Set(source.split(",").map((item) => item.trim()).filter(Boolean))); +} + +function envReasoningMap(name: string, fallback: Record): Record { + const raw = process.env[name]; + const map: Record = { ...fallback }; + if (raw === undefined || raw.trim().length === 0) return map; + for (const item of raw.split(/[,;]+/u)) { + const [model, effort] = item.split("=", 2).map((part) => part.trim()); + if (model && effort) map[model.toLowerCase()] = effort; + } + return map; +} + +function clampAttempts(value: number): number { + return Math.max(1, Math.min(maxTaskAttempts, Math.floor(value))); +} + +function configFromEnv(): RuntimeConfig { + const databaseUrl = process.env.DATABASE_URL || ""; + if (!databaseUrl) throw new Error("DATABASE_URL is required"); + const defaultModel = normalizeCodeModel(envString("CODE_QUEUE_DEFAULT_MODEL", "gpt-5.5")); + const codeModels = Array.from(new Set([defaultModel, ...envList("CODE_QUEUE_MODELS", defaultCodeModels).map(normalizeCodeModel)])); + return { + host: envString("HOST", "0.0.0.0"), + port: envNumber("PORT", 4278), + databaseUrl, + logFile: envString("LOG_FILE", "/var/log/unidesk/code-queue-mgr.jsonl"), + mgrDatabasePoolMax: Math.max(1, Math.min(2, envNumber("CODE_QUEUE_MGR_DATABASE_POOL_MAX", 2))), + traceDatabasePoolMax: Math.max(1, Math.min(2, envNumber("CODE_QUEUE_TRACE_DATABASE_POOL_MAX", 1))), + defaultProviderId: envString("CODE_QUEUE_MAIN_PROVIDER_ID", "D601"), + defaultWorkdir: envString("CODE_QUEUE_WORKDIR", "/workspace"), + remoteDefaultWorkdir: envString("CODE_QUEUE_REMOTE_WORKDIR", "/home/ubuntu"), + defaultModel, + codeModels, + defaultMaxAttempts: clampAttempts(envNumber("CODE_QUEUE_MAX_ATTEMPTS", maxTaskAttempts)), + defaultReasoningEffort: process.env.CODE_QUEUE_DEFAULT_REASONING_EFFORT?.trim() || null, + modelReasoningEfforts: envReasoningMap("CODE_QUEUE_MODEL_REASONING_EFFORTS", { "gpt-5.5": "xhigh" }), + defaultQueueId, + maxListLimit: Math.max(20, Math.min(500, envNumber("CODE_QUEUE_MGR_MAX_LIST_LIMIT", 200))), + }; +} + +const config = configFromEnv(); +const mgrSql = postgres(config.databaseUrl, { + max: config.mgrDatabasePoolMax, + idle_timeout: 20, + connect_timeout: 5, + connection: { application_name: "unidesk-code-queue-mgr" }, +}); +const traceSql = postgres(config.databaseUrl, { + max: config.traceDatabasePoolMax, + idle_timeout: 20, + connect_timeout: 5, + connection: { application_name: "unidesk-code-queue-trace-read" }, +}); +const logWriter = config.logFile + ? createHourlyJsonlWriter({ + baseLogFile: config.logFile, + service: "code-queue-mgr", + maxBytes: logRetentionBytesForService("code-queue-mgr"), + }) + : null; +logWriter?.prune(); + +function toJsonValue(value: unknown): JsonValue { + if (value === undefined) return null; + return JSON.parse(JSON.stringify(value)) as JsonValue; +} + +function errorToJson(error: unknown): JsonRecord { + if (error instanceof Error) return { name: error.name, message: error.message, stack: error.stack ?? null }; + return { message: String(error) }; +} + +function log(level: "info" | "warn" | "error", event: string, detail: JsonRecord = {}): void { + const record: JsonRecord = { at: new Date().toISOString(), service: "code-queue-mgr", level, event, ...detail }; + recentLogs.push(record); + while (recentLogs.length > 300) recentLogs.shift(); + logWriter?.appendJson(record); + if (level === "error") console.error(JSON.stringify(record)); + else console.log(JSON.stringify(record)); +} + +function jsonResponse(body: unknown, status = 200): Response { + return new Response(JSON.stringify(body, null, 2), { + status, + headers: { + "content-type": "application/json; charset=utf-8", + "access-control-allow-origin": "*", + "access-control-allow-methods": "GET,HEAD,POST,PATCH,PUT,DELETE,OPTIONS", + "access-control-allow-headers": "content-type,accept", + }, + }); +} + +async function readJson(req: Request): Promise { + const text = await req.text(); + if (text.trim().length === 0) return {}; + return JSON.parse(text) as unknown; +} + +function asRecord(value: unknown): Record | null { + return typeof value === "object" && value !== null && !Array.isArray(value) ? value as Record : null; +} + +function asJsonRecord(value: unknown): JsonRecord { + return asRecord(value) as JsonRecord | null ?? {}; +} + +function nowIso(): string { + return new Date().toISOString(); +} + +function timestampToIso(value: Date | string | null | undefined): string | null { + if (value === null || value === undefined) return null; + const date = value instanceof Date ? value : new Date(value); + return Number.isNaN(date.getTime()) ? String(value) : date.toISOString(); +} + +function timestampMs(value: string | Date | null | undefined): number | null { + if (value === null || value === undefined) return null; + const date = value instanceof Date ? value : new Date(value); + const ms = date.getTime(); + return Number.isFinite(ms) ? ms : null; +} + +function safePreview(value: unknown, maxChars: number): string { + const text = String(value ?? ""); + if (text.length <= maxChars) return text; + return `${text.slice(0, Math.max(0, maxChars - 20))}\n...`; +} + +function prefixPreview(value: unknown, maxChars: number): string { + const text = String(value ?? ""); + return text.length <= maxChars ? text : `${text.slice(0, Math.max(0, maxChars - 1))}…`; +} + +function normalizeQueueId(value: unknown, fallback = config.defaultQueueId): string { + const text = typeof value === "string" ? value.trim() : ""; + if (text.length === 0) return fallback; + if (!queueIdPattern.test(text)) throw new Error("queueId must match /^[A-Za-z0-9][A-Za-z0-9_.-]{0,63}$/"); + return text; +} + +function safeQueueId(value: unknown): string { + try { + return normalizeQueueId(value); + } catch { + return config.defaultQueueId; + } +} + +function normalizeQueueName(value: unknown, queueId: string): string { + const fallback = queueId.length > 0 ? queueId : config.defaultQueueId; + const text = typeof value === "string" ? value.replace(/[\u0000-\u001F\u007F]+/gu, " ").replace(/\s+/gu, " ").trim() : ""; + if (text.length === 0) return fallback; + if (text.length > 80) throw new Error("queue name must be 80 characters or fewer"); + return text; +} + +function normalizeCodeModel(value: string): string { + const raw = String(value || "").trim(); + if (raw.length === 0) return raw; + const lower = raw.toLowerCase(); + const leaf = lower.includes("/") ? lower.split("/").at(-1) ?? lower : lower; + if (leaf === "minimax-m2.7" || leaf === "m2.7") return "minimax-m2.7"; + return raw; +} + +function codeAgentPortForModel(model: string): "codex" | "opencode" { + return normalizeCodeModel(model) === "minimax-m2.7" ? "opencode" : "codex"; +} + +function codeAgentPortInfo(kind: "codex" | "opencode"): JsonRecord { + return { + kind, + protocol: kind === "codex" ? "codex-app-server-jsonrpc-stdio" : "opencode-run-json-events", + sessionResume: kind === "codex" ? "thread/resume" : "opencode --session with persisted XDG storage and stale-session recovery", + tracePort: kind === "codex" ? "codex-transcript" : "opencode-json-event", + }; +} + +function normalizeExecutionMode(value: unknown): CodeExecutionMode { + const raw = typeof value === "string" ? value.trim().toLowerCase() : ""; + if (raw === "windows-native" || raw === "windows" || raw === "win32" || raw === "native-windows") return "windows-native"; + return "default"; +} + +function executionModeInfo(mode: CodeExecutionMode): JsonRecord { + if (mode === "windows-native") { + return { + kind: mode, + label: "Windows native Codex", + description: "Run Codex on the provider Windows host while the execution container relays stdio.", + codexRunsInContainer: false, + requiresProvider: true, + requiresWindowsCwd: true, + }; + } + return { + kind: mode, + label: "Default Codex runtime", + description: "Use the Code Queue node Codex runtime or the provider execution container.", + codexRunsInContainer: true, + requiresProvider: false, + requiresWindowsCwd: false, + }; +} + +function resolveReasoningEffort(model: string, explicit?: string | null): string | null { + const requested = explicit?.trim(); + if (requested) return requested; + return config.modelReasoningEfforts[model.toLowerCase()] ?? config.defaultReasoningEffort; +} + +function normalizeProviderId(value: unknown): string { + const text = typeof value === "string" ? value.trim() : ""; + return text.length > 0 ? text : config.defaultProviderId; +} + +function normalizeCwd(providerId: string, value: unknown): string { + const text = typeof value === "string" && value.trim().length > 0 ? value.trim() : ""; + if (text.length > 0) return text; + return providerId === "main-server" ? config.defaultWorkdir : config.remoteDefaultWorkdir; +} + +function normalizeStringArray(value: unknown): string[] { + if (!Array.isArray(value)) return []; + return Array.from(new Set(value.map((item) => typeof item === "string" ? item.trim() : "").filter(Boolean))); +} + +function normalizePromptText(value: unknown): string { + if (typeof value !== "string" || value.trim().length === 0) throw new Error("prompt is required"); + return value; +} + +function stripAutoReferenceHint(prompt: string): string { + const trimmed = prompt.trimStart(); + if (!/^引用\s+Code Queue\s+任务\s+codex_\d+_[A-Za-z0-9_-]+/u.test(trimmed)) return prompt; + const marker = "\n\n本次任务:"; + const index = trimmed.indexOf(marker); + if (index === -1) return prompt; + return trimmed.slice(index + marker.length).trimStart(); +} + +function stripResolvedReferenceContext(prompt: string): string { + const trimmed = prompt.trimStart(); + if (!trimmed.startsWith(resolvedReferenceContextTitle)) return prompt; + const offset = prompt.length - trimmed.length; + const index = prompt.lastIndexOf(currentTaskPromptMarker); + if (index < offset) return prompt; + return prompt.slice(index + currentTaskPromptMarker.length).trimStart(); +} + +function stripCodeQueueEnvironmentHint(prompt: string): string { + const trimmed = prompt.trimStart(); + if (!trimmed.startsWith(codeQueueEnvironmentHintTitle)) return prompt; + const offset = prompt.length - trimmed.length; + const index = prompt.indexOf(currentTaskPromptMarker, offset); + if (index < offset) return prompt; + return prompt.slice(index + currentTaskPromptMarker.length).trimStart(); +} + +function userPromptForDisplay(prompt: string): string { + return stripAutoReferenceHint(stripResolvedReferenceContext(stripCodeQueueEnvironmentHint(prompt))); +} + +function promptWithCodeQueueEnvironmentHint(prompt: string): string { + if (prompt.trimStart().startsWith(codeQueueEnvironmentHintTitle)) return prompt; + return [codeQueueEnvironmentHint, "", "# 本次任务", prompt.trim()].join("\n"); +} + +function isCodexTaskId(value: string): boolean { + return codexTaskIdPattern.test(value.trim()); +} + +function addUniqueTaskId(ids: string[], value: string): void { + const id = value.trim(); + if (isCodexTaskId(id) && !ids.includes(id)) ids.push(id); +} + +function collectTaskIdsFromValue(value: unknown, ids: string[]): void { + if (typeof value === "string") { + for (const part of value.split(/[\s,,;;]+/u)) addUniqueTaskId(ids, part); + return; + } + if (Array.isArray(value)) { + for (const item of value) collectTaskIdsFromValue(item, ids); + } +} + +function referenceTaskIdsFromPrompt(prompt: string): string[] { + const ids: string[] = []; + const patterns = [ + /引用\s+Code Queue\s+任务\s+(codex_\d+_[A-Za-z0-9_-]+)/giu, + /\bcodex\s+task\s+(codex_\d+_[A-Za-z0-9_-]+)/giu, + /(?:引用|上下文|context|reference)[^\n]{0,160}\b(codex_\d+_[A-Za-z0-9_-]+)/giu, + ]; + for (const pattern of patterns) { + for (const match of prompt.matchAll(pattern)) addUniqueTaskId(ids, String(match[1] ?? "")); + } + return ids; +} + +function collectReferenceTaskIds(record: Record, prompt: string, fallback: string[] = []): string[] { + const ids: string[] = []; + const hasExplicitReferenceIds = Object.prototype.hasOwnProperty.call(record, "referenceTaskId") + || Object.prototype.hasOwnProperty.call(record, "referenceTaskIds"); + if (hasExplicitReferenceIds) { + collectTaskIdsFromValue(record.referenceTaskId, ids); + collectTaskIdsFromValue(record.referenceTaskIds, ids); + } else { + for (const id of fallback) addUniqueTaskId(ids, id); + } + for (const id of referenceTaskIdsFromPrompt(prompt)) addUniqueTaskId(ids, id); + return ids; +} + +function makeTaskId(): string { + return `codex_${Date.now()}_${Math.random().toString(16).slice(2, 8)}`; +} + +function terminalTask(task: QueueTask): boolean { + return task.status === "succeeded" || task.status === "failed" || task.status === "canceled"; +} + +function terminalTaskUnread(task: QueueTask): boolean { + return terminalTask(task) && task.readAt === null; +} + +function queuedTaskPromptEditable(task: QueueTask): boolean { + return task.status === "queued" + && task.startedAt === null + && task.currentAttempt === 0 + && task.codexThreadId === null + && task.nextMode === null + && task.nextPrompt === null + && task.attempts.length === 0; +} + +function numberField(value: unknown, fallback = 0): number { + const parsed = Number(value); + return Number.isFinite(parsed) ? Math.floor(parsed) : fallback; +} + +function outputMaxSeq(task: QueueTask): number { + const outputSeq = task.output.map((item) => Number(item.seq)).filter(Number.isFinite); + const promptSeq = task.promptHistory.map((item) => numberField(asRecord(item)?.seq, 0)); + const attemptSeq = task.attempts.map((item) => numberField(item.outputEndSeq, 0)); + return Math.max(0, numberField(task.outputMaxSeq, 0), ...outputSeq, ...promptSeq, ...attemptSeq); +} + +function normalizeOutput(value: unknown): LiveOutput[] { + if (!Array.isArray(value)) return []; + return value.flatMap((item): LiveOutput[] => { + const record = asRecord(item); + if (record === null) return []; + const seq = numberField(record.seq, 0); + if (seq <= 0) return []; + const channel = String(record.channel || "system") as OutputChannel; + return [{ + seq, + at: timestampToIso(typeof record.at === "string" ? record.at : null) ?? nowIso(), + channel, + text: String(record.text ?? ""), + ...(typeof record.method === "string" ? { method: record.method } : {}), + ...(typeof record.itemId === "string" ? { itemId: record.itemId } : {}), + }]; + }).sort((left, right) => left.seq - right.seq); +} + +function normalizeTask(value: unknown): QueueTask { + const record = asRecord(value) ?? {}; + const status = String(record.status || "queued") as TaskStatus; + const id = typeof record.id === "string" && record.id.length > 0 ? record.id : makeTaskId(); + const queueId = safeQueueId(record.queueId); + const createdAt = timestampToIso(typeof record.createdAt === "string" ? record.createdAt : null) ?? nowIso(); + const updatedAt = timestampToIso(typeof record.updatedAt === "string" ? record.updatedAt : null) ?? createdAt; + const model = normalizeCodeModel(typeof record.model === "string" && record.model.length > 0 ? record.model : config.defaultModel); + const providerId = normalizeProviderId(record.providerId); + const executionMode = normalizeExecutionMode(record.executionMode); + const prompt = typeof record.prompt === "string" ? record.prompt : ""; + const basePrompt = typeof record.basePrompt === "string" && record.basePrompt.length > 0 ? record.basePrompt : userPromptForDisplay(prompt); + const task: QueueTask = { + id, + queueId, + queueEnteredAt: timestampToIso(typeof record.queueEnteredAt === "string" ? record.queueEnteredAt : null) ?? createdAt, + prompt, + basePrompt, + referenceTaskIds: normalizeStringArray(record.referenceTaskIds), + referenceInjection: toJsonValue(record.referenceInjection ?? null), + providerId, + cwd: normalizeCwd(providerId, record.cwd), + model, + reasoningEffort: typeof record.reasoningEffort === "string" ? record.reasoningEffort : resolveReasoningEffort(model), + executionMode, + maxAttempts: clampAttempts(numberField(record.maxAttempts, config.defaultMaxAttempts)), + status, + createdAt, + updatedAt, + startedAt: timestampToIso(typeof record.startedAt === "string" ? record.startedAt : null), + finishedAt: terminalTask({ status } as QueueTask) ? timestampToIso(typeof record.finishedAt === "string" ? record.finishedAt : null) : null, + readAt: terminalTask({ status } as QueueTask) ? timestampToIso(typeof record.readAt === "string" ? record.readAt : null) : null, + currentAttempt: numberField(record.currentAttempt, 0), + currentMode: record.currentMode === "initial" || record.currentMode === "retry" ? record.currentMode : null, + codexThreadId: typeof record.codexThreadId === "string" ? record.codexThreadId : null, + activeTurnId: typeof record.activeTurnId === "string" ? record.activeTurnId : null, + finalResponse: typeof record.finalResponse === "string" ? record.finalResponse : "", + lastError: typeof record.lastError === "string" ? record.lastError : null, + lastJudge: toJsonValue(record.lastJudge ?? null), + judgeFailCount: numberField(record.judgeFailCount, 0), + promptHistory: Array.isArray(record.promptHistory) ? record.promptHistory.map(toJsonValue) : [], + output: normalizeOutput(record.output), + events: Array.isArray(record.events) ? record.events.map(toJsonValue) : [], + attempts: Array.isArray(record.attempts) ? record.attempts.map((item, index) => normalizeAttempt(item, index + 1)) : [], + cancelRequested: record.cancelRequested === true, + nextPrompt: typeof record.nextPrompt === "string" ? record.nextPrompt : null, + nextMode: record.nextMode === "initial" || record.nextMode === "retry" ? record.nextMode : null, + }; + task.stepCount = numberField(record.stepCount ?? record.llmStepCount, task.output.length); + task.llmStepCount = task.stepCount; + task.outputMaxSeq = Math.max(outputMaxSeq(task), numberField(record.outputMaxSeq, 0)); + return task; +} + +function normalizeAttempt(value: unknown, fallbackIndex: number): AttemptSummary { + const record = asRecord(value) ?? {}; + const finalResponse = typeof record.finalResponse === "string" ? record.finalResponse : ""; + return { + index: numberField(record.index, fallbackIndex), + mode: record.mode === "retry" ? "retry" : "initial", + startedAt: timestampToIso(typeof record.startedAt === "string" ? record.startedAt : null) ?? nowIso(), + finishedAt: timestampToIso(typeof record.finishedAt === "string" ? record.finishedAt : null) ?? nowIso(), + providerId: typeof record.providerId === "string" ? record.providerId : undefined, + executionMode: normalizeExecutionMode(record.executionMode), + terminalStatus: record.terminalStatus === "completed" || record.terminalStatus === "interrupted" || record.terminalStatus === "failed" ? record.terminalStatus : null, + transportClosedBeforeTerminal: record.transportClosedBeforeTerminal === true, + appServerExitCode: typeof record.appServerExitCode === "number" ? record.appServerExitCode : null, + appServerSignal: typeof record.appServerSignal === "string" ? record.appServerSignal : null, + error: typeof record.error === "string" ? record.error : null, + events: Array.isArray(record.events) ? record.events.map(toJsonValue) : undefined, + inputPrompt: typeof record.inputPrompt === "string" ? record.inputPrompt : undefined, + inputPromptPreview: typeof record.inputPromptPreview === "string" ? record.inputPromptPreview : undefined, + inputPromptChars: typeof record.inputPromptChars === "number" ? record.inputPromptChars : undefined, + inputPromptLines: typeof record.inputPromptLines === "number" ? record.inputPromptLines : undefined, + finalResponse, + finalResponsePreview: typeof record.finalResponsePreview === "string" ? record.finalResponsePreview : safePreview(finalResponse, 1200), + finalResponseChars: typeof record.finalResponseChars === "number" ? record.finalResponseChars : finalResponse.length, + judge: toJsonValue(record.judge ?? null), + judgeAt: typeof record.judgeAt === "string" ? record.judgeAt : null, + judgeSeq: typeof record.judgeSeq === "number" ? record.judgeSeq : null, + feedbackPrompt: typeof record.feedbackPrompt === "string" ? record.feedbackPrompt : undefined, + feedbackPromptPreview: typeof record.feedbackPromptPreview === "string" ? record.feedbackPromptPreview : undefined, + feedbackPromptChars: typeof record.feedbackPromptChars === "number" ? record.feedbackPromptChars : undefined, + feedbackPromptLines: typeof record.feedbackPromptLines === "number" ? record.feedbackPromptLines : undefined, + feedbackPromptSource: typeof record.feedbackPromptSource === "string" ? record.feedbackPromptSource : undefined, + feedbackPromptForAttempt: typeof record.feedbackPromptForAttempt === "number" ? record.feedbackPromptForAttempt : null, + stderrTail: typeof record.stderrTail === "string" ? record.stderrTail : "", + outputStartSeq: typeof record.outputStartSeq === "number" ? record.outputStartSeq : null, + outputEndSeq: typeof record.outputEndSeq === "number" ? record.outputEndSeq : null, + errorCount: typeof record.errorCount === "number" ? record.errorCount : undefined, + }; +} + +function taskReferenceIds(task: QueueTask): string[] { + const ids: string[] = []; + for (const id of task.referenceTaskIds ?? []) addUniqueTaskId(ids, id); + for (const id of referenceTaskIdsFromPrompt(task.basePrompt || userPromptForDisplay(task.prompt))) addUniqueTaskId(ids, id); + return ids; +} + +function taskBasePrompt(task: QueueTask): string { + return (task.basePrompt || userPromptForDisplay(task.prompt)).trimEnd(); +} + +function referenceSummaryItem(task: QueueTask, round: number, roundIndex: number, viaTaskId: string | null): ReferenceInjectionSummaryItem { + const lastMessage = lastAssistantMessage(task); + const lastText = typeof lastMessage.text === "string" ? lastMessage.text : ""; + return { + round, + roundIndex, + taskId: task.id, + viaTaskId, + status: task.status, + providerId: task.providerId, + executionMode: task.executionMode, + model: task.model, + cwd: task.cwd, + createdAt: task.createdAt, + updatedAt: task.updatedAt, + promptChars: taskBasePrompt(task).length, + finalResponseChars: lastText.trimEnd().length, + finalResponseAt: typeof lastMessage.at === "string" ? lastMessage.at : null, + finalResponseSource: typeof lastMessage.source === "string" ? lastMessage.source : "unknown", + referenceTaskIds: taskReferenceIds(task), + cliHint: `bun scripts/cli.ts codex task ${task.id}`, + }; +} + +function cachedTaskFinder(finder: (id: string) => Promise): (id: string) => Promise { + const cache = new Map>(); + return (id: string) => { + const existing = cache.get(id); + if (existing !== undefined) return existing; + const next = finder(id); + cache.set(id, next); + return next; + }; +} + +async function collectReferenceGraph(rootIds: string[], finder: (id: string) => Promise): Promise<{ items: ReferenceInjectionSummaryItem[]; tasks: QueueTask[]; truncated: boolean }> { + const seen = new Set(); + let frontier = rootIds.map((id) => ({ id, viaTaskId: null as string | null })); + const rawItems: Array<{ task: QueueTask; depth: number; viaTaskId: string | null; discoveryIndex: number }> = []; + let truncated = false; + let discoveryIndex = 0; + for (let depth = 1; frontier.length > 0; depth += 1) { + if (referenceInjectionMaxRounds !== null && depth > referenceInjectionMaxRounds) { + truncated = frontier.some((entry) => !seen.has(entry.id)); + break; + } + const next: Array<{ id: string; viaTaskId: string | null }> = []; + for (const entry of frontier) { + if (seen.has(entry.id)) continue; + const task = await finder(entry.id); + if (task === null) continue; + seen.add(entry.id); + discoveryIndex += 1; + rawItems.push({ task, depth, viaTaskId: entry.viaTaskId, discoveryIndex }); + for (const childId of taskReferenceIds(task)) { + if (!seen.has(childId)) next.push({ id: childId, viaTaskId: task.id }); + } + } + frontier = next; + } + if (frontier.some((entry) => !seen.has(entry.id))) truncated = true; + const sorted = rawItems.sort((left, right) => { + if (left.depth !== right.depth) return right.depth - left.depth; + const createdDelta = Date.parse(left.task.createdAt) - Date.parse(right.task.createdAt); + if (Number.isFinite(createdDelta) && createdDelta !== 0) return createdDelta; + return left.discoveryIndex - right.discoveryIndex; + }); + const depthToRound = new Map(); + Array.from(new Set(sorted.map((item) => item.depth))).forEach((depth, index) => depthToRound.set(depth, index + 1)); + const roundCounts = new Map(); + const items = sorted.map((item) => { + const round = depthToRound.get(item.depth) ?? item.depth; + const roundIndex = (roundCounts.get(round) ?? 0) + 1; + roundCounts.set(round, roundIndex); + return referenceSummaryItem(item.task, round, roundIndex, item.viaTaskId); + }); + return { items, tasks: sorted.map((item) => item.task), truncated }; +} + +function referenceRoundSeparator(round: number, totalRounds: number, items: ReferenceInjectionSummaryItem[]): string { + const createdTimes = items.map((item) => item.createdAt).filter(Boolean).sort(); + const updatedTimes = items.map((item) => item.updatedAt).filter(Boolean).sort(); + return [ + `----- Reference Round ${round}/${totalRounds} -----`, + "order: upstream/oldest context first; direct references appear in the last round", + `tasks: ${items.length}`, + `createdRange: ${createdTimes[0] ?? "--"} -> ${createdTimes.at(-1) ?? "--"}`, + `updatedRange: ${updatedTimes[0] ?? "--"} -> ${updatedTimes.at(-1) ?? "--"}`, + "--------------------------------", + ].join("\n"); +} + +function referencedTaskContext(task: QueueTask, summary: ReferenceInjectionSummaryItem): string { + const lastMessage = lastAssistantMessage(task); + const lastMessageText = typeof lastMessage.text === "string" ? lastMessage.text : ""; + return [ + `## Round ${summary.round}.${summary.roundIndex} referenced task ${task.id}`, + `- via: ${summary.viaTaskId ?? "direct"}`, + `- status/provider/model/cwd: ${task.status} / ${task.providerId} / ${task.model} / ${task.cwd}`, + `- created/updated: ${task.createdAt} / ${task.updatedAt}`, + `- cli: bun scripts/cli.ts codex task ${task.id}`, + "", + "### Initial prompt", + taskBasePrompt(task) || "(empty)", + "", + "### Final/last response", + lastMessageText.trimEnd() || "(none yet)", + "", + "### Query more context", + `Run: bun scripts/cli.ts codex task ${task.id}`, + ].join("\n"); +} + +async function injectReferencedTaskContext(prompt: string, basePrompt: string, referenceTaskIds: string[]): Promise { + if (referenceTaskIds.length === 0 || prompt.includes(resolvedReferenceContextTitle)) { + return { prompt, basePrompt, referenceTaskIds, referenceInjection: null }; + } + if (referenceTaskIds.length > 5) throw new Error(`referenceTaskIds supports at most 5 task ids, got ${referenceTaskIds.length}`); + const finder = cachedTaskFinder(loadTask); + const referencedTasks = await Promise.all(referenceTaskIds.map((id) => finder(id))); + const missing = referenceTaskIds.filter((_id, index) => referencedTasks[index] === null); + if (missing.length > 0) throw new Error(`referenced Code Queue task not found: ${missing.join(", ")}`); + const injectedAt = nowIso(); + const graph = await collectReferenceGraph(referenceTaskIds, finder); + const injection: ReferenceInjectionRecord = { + version: 2, + injectedAt, + basePrompt, + directReferenceTaskIds: referenceTaskIds, + maxRounds: referenceInjectionMaxRounds, + truncated: graph.truncated, + itemCount: graph.items.length, + items: graph.items, + }; + const taskById = new Map(graph.tasks.map((task) => [task.id, task])); + const groupedItems = Array.from(new Set(graph.items.map((item) => item.round))).map((round) => ({ + round, + items: graph.items.filter((item) => item.round === round), + })); + const context = [ + resolvedReferenceContextTitle, + `injectedAt: ${injectedAt}`, + `directReferences: ${referenceTaskIds.join(", ")}`, + `referenceGraphItems: ${graph.items.length}${graph.truncated ? " (truncated)" : ""}`, + "说明:Code Queue 后端只读取每个被引用任务的结构化 basePrompt(注入前 prompt)和 final/last response;不会把历史引用注入块继续套入。多轮引用按上游/最早上下文在前、直接引用在后的顺序注入;中间执行过程不注入,只保留 CLI 查询提示。", + "", + ...(groupedItems.flatMap((group) => [ + referenceRoundSeparator(group.round, groupedItems.length, group.items), + "", + ...group.items.map((item) => { + const task = taskById.get(item.taskId); + return task === undefined ? "" : referencedTaskContext(task, item); + }).filter((text) => text.length > 0), + "", + ])), + "", + "# 本次任务", + basePrompt.trim(), + ].join("\n"); + return { prompt: context, basePrompt, referenceTaskIds, referenceInjection: toJsonValue(injection) }; +} + +async function preparePromptFromRecord(record: Record, fallbackReferenceTaskIds: string[] = []): Promise { + const rawPrompt = normalizePromptText(record.prompt); + const basePrompt = typeof record.basePrompt === "string" && record.basePrompt.length > 0 ? record.basePrompt : userPromptForDisplay(rawPrompt); + const referenceTaskIds = collectReferenceTaskIds(record, rawPrompt, fallbackReferenceTaskIds); + const injected = await injectReferencedTaskContext(rawPrompt, basePrompt, referenceTaskIds); + return { + prompt: promptWithCodeQueueEnvironmentHint(injected.prompt), + basePrompt: injected.basePrompt, + referenceTaskIds: injected.referenceTaskIds, + referenceInjection: injected.referenceInjection ?? toJsonValue(record.referenceInjection ?? null), + }; +} + +async function createTaskFromRequest(value: unknown): Promise { + const record = asRecord(value); + if (record === null) throw new Error("request body must be an object"); + const at = nowIso(); + const prepared = await preparePromptFromRecord(record); + const providerId = normalizeProviderId(record.providerId); + const model = normalizeCodeModel(typeof record.model === "string" && record.model.length > 0 ? record.model : config.defaultModel); + const executionMode = normalizeExecutionMode(record.executionMode); + const output: LiveOutput[] = [{ seq: 1, at, channel: "user", text: `${prepared.prompt}\n`, method: "enqueue" }]; + return { + id: makeTaskId(), + queueId: normalizeQueueId(record.queueId), + queueEnteredAt: at, + prompt: prepared.prompt, + basePrompt: prepared.basePrompt, + referenceTaskIds: prepared.referenceTaskIds, + referenceInjection: prepared.referenceInjection, + providerId, + cwd: normalizeCwd(providerId, record.cwd), + model, + reasoningEffort: typeof record.reasoningEffort === "string" ? record.reasoningEffort : resolveReasoningEffort(model), + executionMode, + maxAttempts: clampAttempts(numberField(record.maxAttempts, config.defaultMaxAttempts)), + status: "queued", + createdAt: at, + updatedAt: at, + startedAt: null, + finishedAt: null, + readAt: null, + currentAttempt: 0, + currentMode: null, + codexThreadId: null, + activeTurnId: null, + finalResponse: "", + stepCount: 0, + llmStepCount: 0, + outputMaxSeq: 1, + lastError: null, + lastJudge: null, + judgeFailCount: 0, + promptHistory: [], + output, + events: [], + attempts: [], + cancelRequested: false, + nextPrompt: null, + nextMode: null, + }; +} + +function taskJson(task: QueueTask): JsonValue { + return toJsonValue(task); +} + +function queueIdOf(task: QueueTask): string { + return safeQueueId(task.queueId); +} + +function rowToTask(row: Pick): QueueTask { + return normalizeTask(row.task_json); +} + +function queueRowToRecord(row: QueueRow): QueueRecord { + const id = safeQueueId(row.id); + return { + id, + name: normalizeQueueName(row.name, id), + createdAt: timestampToIso(row.created_at) ?? nowIso(), + updatedAt: timestampToIso(row.updated_at) ?? nowIso(), + }; +} + +async function ensureSchema(): Promise { + await mgrSql` + CREATE TABLE IF NOT EXISTS unidesk_code_queue_tasks ( + id TEXT PRIMARY KEY, + queue_id TEXT NOT NULL DEFAULT 'default', + status TEXT NOT NULL, + provider_id TEXT NOT NULL DEFAULT 'main-server', + execution_mode TEXT NOT NULL DEFAULT 'default', + model TEXT NOT NULL, + cwd TEXT NOT NULL, + prompt TEXT NOT NULL, + base_prompt TEXT NOT NULL DEFAULT '', + reference_task_ids JSONB NOT NULL DEFAULT '[]'::jsonb, + reference_injection JSONB, + reasoning_effort TEXT, + max_attempts INTEGER NOT NULL, + current_attempt INTEGER NOT NULL DEFAULT 0, + current_mode TEXT, + codex_thread_id TEXT, + active_turn_id TEXT, + created_at TIMESTAMPTZ NOT NULL, + updated_at TIMESTAMPTZ NOT NULL, + started_at TIMESTAMPTZ, + finished_at TIMESTAMPTZ, + read_at TIMESTAMPTZ, + last_error TEXT, + last_judge JSONB, + output_count INTEGER NOT NULL DEFAULT 0, + event_count INTEGER NOT NULL DEFAULT 0, + attempt_count INTEGER NOT NULL DEFAULT 0, + last_output_seq BIGINT NOT NULL DEFAULT 0, + task_json JSONB NOT NULL + ) + `; + await mgrSql` + CREATE TABLE IF NOT EXISTS unidesk_code_queue_queues ( + id TEXT PRIMARY KEY, + name TEXT NOT NULL DEFAULT '', + created_at TIMESTAMPTZ NOT NULL, + updated_at TIMESTAMPTZ NOT NULL + ) + `; + await mgrSql`ALTER TABLE unidesk_code_queue_tasks ADD COLUMN IF NOT EXISTS queue_id TEXT NOT NULL DEFAULT 'default'`; + await mgrSql`ALTER TABLE unidesk_code_queue_tasks ADD COLUMN IF NOT EXISTS provider_id TEXT NOT NULL DEFAULT 'main-server'`; + await mgrSql`ALTER TABLE unidesk_code_queue_tasks ADD COLUMN IF NOT EXISTS execution_mode TEXT NOT NULL DEFAULT 'default'`; + await mgrSql`ALTER TABLE unidesk_code_queue_tasks ADD COLUMN IF NOT EXISTS base_prompt TEXT NOT NULL DEFAULT ''`; + await mgrSql`ALTER TABLE unidesk_code_queue_tasks ADD COLUMN IF NOT EXISTS reference_task_ids JSONB NOT NULL DEFAULT '[]'::jsonb`; + await mgrSql`ALTER TABLE unidesk_code_queue_tasks ADD COLUMN IF NOT EXISTS reference_injection JSONB`; + await mgrSql`ALTER TABLE unidesk_code_queue_tasks ADD COLUMN IF NOT EXISTS read_at TIMESTAMPTZ`; + await mgrSql`ALTER TABLE unidesk_code_queue_queues ADD COLUMN IF NOT EXISTS name TEXT NOT NULL DEFAULT ''`; + const now = nowIso(); + await upsertQueue({ id: config.defaultQueueId, name: config.defaultQueueId, createdAt: now, updatedAt: now }); + schemaReady = true; + schemaLastError = null; +} + +async function ensureSchemaWithRetry(): Promise { + let attempt = 0; + while (!schemaReady) { + attempt += 1; + try { + await ensureSchema(); + log("info", "schema_ready", { attempt }); + return; + } catch (error) { + schemaLastError = errorToJson(error); + lastDatabaseError = schemaLastError.message as string; + log("warn", "schema_retry", { attempt, error: schemaLastError }); + await Bun.sleep(Math.min(1000 * attempt, 10_000)); + } + } +} + +async function upsertQueue(queue: QueueRecord): Promise { + await mgrSql` + INSERT INTO unidesk_code_queue_queues (id, name, created_at, updated_at) + VALUES (${queue.id}, ${queue.name}, ${queue.createdAt}, ${queue.updatedAt}) + ON CONFLICT (id) DO UPDATE SET + name = EXCLUDED.name, + updated_at = GREATEST(unidesk_code_queue_queues.updated_at, EXCLUDED.updated_at) + `; +} + +async function upsertTask(client: SqlExecutor, task: QueueTask): Promise { + await client` + INSERT INTO unidesk_code_queue_tasks ( + id, queue_id, status, provider_id, execution_mode, model, cwd, prompt, base_prompt, + reference_task_ids, reference_injection, reasoning_effort, max_attempts, + current_attempt, current_mode, codex_thread_id, active_turn_id, + created_at, updated_at, started_at, finished_at, read_at, last_error, + last_judge, output_count, event_count, attempt_count, last_output_seq, task_json + ) VALUES ( + ${task.id}, ${queueIdOf(task)}, ${task.status}, ${task.providerId}, ${task.executionMode}, + ${task.model}, ${task.cwd}, ${task.prompt}, ${task.basePrompt}, + ${client.json(task.referenceTaskIds as unknown as postgres.JSONValue)}, ${task.referenceInjection === null ? null : client.json(task.referenceInjection as unknown as postgres.JSONValue)}, + ${task.reasoningEffort}, ${task.maxAttempts}, ${task.currentAttempt}, ${task.currentMode}, + ${task.codexThreadId}, ${task.activeTurnId}, ${task.createdAt}, ${task.updatedAt}, + ${task.startedAt}, ${task.finishedAt}, ${task.readAt}, ${task.lastError}, + ${task.lastJudge === null ? null : client.json(task.lastJudge as unknown as postgres.JSONValue)}, + ${task.output.length}, ${task.events.length}, ${task.attempts.length}, ${outputMaxSeq(task)}, ${client.json(taskJson(task) as unknown as postgres.JSONValue)} + ) + ON CONFLICT (id) DO UPDATE SET + queue_id = EXCLUDED.queue_id, + status = EXCLUDED.status, + provider_id = EXCLUDED.provider_id, + execution_mode = EXCLUDED.execution_mode, + model = EXCLUDED.model, + cwd = EXCLUDED.cwd, + prompt = EXCLUDED.prompt, + base_prompt = EXCLUDED.base_prompt, + reference_task_ids = EXCLUDED.reference_task_ids, + reference_injection = EXCLUDED.reference_injection, + reasoning_effort = EXCLUDED.reasoning_effort, + max_attempts = EXCLUDED.max_attempts, + current_attempt = EXCLUDED.current_attempt, + current_mode = EXCLUDED.current_mode, + codex_thread_id = EXCLUDED.codex_thread_id, + active_turn_id = EXCLUDED.active_turn_id, + updated_at = EXCLUDED.updated_at, + started_at = EXCLUDED.started_at, + finished_at = EXCLUDED.finished_at, + read_at = EXCLUDED.read_at, + last_error = EXCLUDED.last_error, + last_judge = EXCLUDED.last_judge, + output_count = EXCLUDED.output_count, + event_count = EXCLUDED.event_count, + attempt_count = EXCLUDED.attempt_count, + last_output_seq = EXCLUDED.last_output_seq, + task_json = EXCLUDED.task_json + WHERE unidesk_code_queue_tasks.status NOT IN ('running', 'judging') + `; +} + +async function loadQueues(): Promise { + const rows = await traceSql` + SELECT id, name, created_at, updated_at + FROM unidesk_code_queue_queues + ORDER BY id ASC + `; + const queues = rows.map(queueRowToRecord); + if (!queues.some((queue) => queue.id === config.defaultQueueId)) { + const at = nowIso(); + queues.unshift({ id: config.defaultQueueId, name: config.defaultQueueId, createdAt: at, updatedAt: at }); + } + return queues; +} + +async function loadTask(taskId: string): Promise { + const rows = await traceSql>>` + SELECT task_json + FROM unidesk_code_queue_tasks + WHERE id = ${taskId} + LIMIT 1 + `; + return rows[0] === undefined ? null : rowToTask(rows[0]); +} + +async function loadTasksForList(url: URL): Promise<{ tasks: QueueTask[]; total: number; limit: number; offset: number }> { + const limit = parseLimit(url); + const offset = Math.max(0, numberField(url.searchParams.get("offset"), 0)); + const status = url.searchParams.get("status"); + const queueIdRaw = url.searchParams.get("queueId"); + const queueId = queueIdRaw === null || queueIdRaw.length === 0 ? null : safeQueueId(queueIdRaw); + const search = String(url.searchParams.get("q") ?? url.searchParams.get("search") ?? "").trim(); + const rows = await traceSql & { total_count: string | number }>>` + SELECT task_json, COUNT(*) OVER() AS total_count + FROM unidesk_code_queue_tasks + WHERE (${status === null} OR status = ${status}) + AND (${queueId === null} OR queue_id = ${queueId}) + AND (${search.length === 0} OR id ILIKE ${`%${search}%`} OR prompt ILIKE ${`%${search}%`} OR base_prompt ILIKE ${`%${search}%`} OR COALESCE(last_error, '') ILIKE ${`%${search}%`}) + ORDER BY + CASE status WHEN 'running' THEN 0 WHEN 'judging' THEN 1 WHEN 'retry_wait' THEN 2 WHEN 'queued' THEN 3 ELSE 8 END ASC, + updated_at DESC, + id DESC + LIMIT ${limit} + OFFSET ${offset} + `; + return { tasks: rows.map(rowToTask), total: Number(rows[0]?.total_count ?? 0), limit, offset }; +} + +async function loadTasksByIds(ids: string[]): Promise { + const unique = Array.from(new Set(ids.map((id) => id.trim()).filter(Boolean))); + if (unique.length === 0) return []; + const rows = await traceSql>>` + SELECT task_json + FROM unidesk_code_queue_tasks + WHERE id IN ${traceSql(unique)} + `; + const byId = new Map(rows.map((row) => { + const task = rowToTask(row); + return [task.id, task] as const; + })); + const tasks: QueueTask[] = []; + for (const id of unique) { + const task = byId.get(id); + if (task !== undefined) tasks.push(task); + } + return tasks; +} + +function parseLimit(url: URL): number { + const raw = Number(url.searchParams.get("limit") ?? 100); + return Number.isInteger(raw) && raw > 0 ? Math.min(config.maxListLimit, raw) : 100; +} + +async function queueSummary(tasks?: QueueTask[], queueRecords?: QueueRecord[]): Promise { + const [taskRows, queues] = await Promise.all([ + tasks === undefined ? loadAllTasksLite() : Promise.resolve(tasks), + queueRecords === undefined ? loadQueues() : Promise.resolve(queueRecords), + ]); + const summaries = new Map(); + for (const queue of queues) { + summaries.set(queue.id, { + id: queue.id, + name: queue.name, + total: 0, + counts: {}, + unreadTerminal: 0, + activeTaskId: null, + runnableTaskId: null, + processing: false, + createdAt: queue.createdAt, + updatedAt: queue.updatedAt, + }); + } + for (const task of taskRows) { + const queueId = queueIdOf(task); + if (!summaries.has(queueId)) { + summaries.set(queueId, { + id: queueId, + name: queueId, + total: 0, + counts: {}, + unreadTerminal: 0, + activeTaskId: null, + runnableTaskId: null, + processing: false, + createdAt: null, + updatedAt: null, + }); + } + const summary = summaries.get(queueId) as JsonRecord; + summary.total = Number(summary.total) + 1; + const counts = asJsonRecord(summary.counts); + counts[task.status] = Number(counts[task.status] ?? 0) + 1; + summary.counts = counts; + if (terminalTaskUnread(task)) summary.unreadTerminal = Number(summary.unreadTerminal) + 1; + if ((task.status === "running" || task.status === "judging") && summary.activeTaskId === null) summary.activeTaskId = task.id; + if ((task.status === "queued" || task.status === "retry_wait") && summary.runnableTaskId === null) summary.runnableTaskId = task.id; + } + const counts = taskRows.reduce>((memo, task) => { + memo[task.status] = (memo[task.status] ?? 0) + 1; + return memo; + }, {}); + const activeTaskIds = taskRows.filter((task) => task.status === "running" || task.status === "judging").map((task) => task.id).sort(); + const queueList = Array.from(summaries.values()).sort((left, right) => String(left.id).localeCompare(String(right.id))); + return { + total: taskRows.length, + defaultQueueId: config.defaultQueueId, + queueCount: queueList.length, + queues: queueList, + activeQueueIds: queueList.filter((queue) => typeof queue.activeTaskId === "string").map((queue) => String(queue.id)), + processingQueueIds: [], + activeRunSlotCount: activeTaskIds.length, + activeRunSlotWaiters: [], + activeTaskIds, + activeTaskId: activeTaskIds[0] ?? null, + processing: false, + counts, + unreadTerminal: taskRows.reduce((total, task) => total + (terminalTaskUnread(task) ? 1 : 0), 0), + judgeConfigured: null, + judgeFailRetryLimit: 3, + defaultModel: config.defaultModel, + codeModels: config.codeModels, + codexModels: config.codeModels.filter((model) => codeAgentPortForModel(model) === "codex"), + opencodeModels: config.codeModels.filter((model) => codeAgentPortForModel(model) === "opencode"), + modelPorts: Object.fromEntries(config.codeModels.map((model) => [model, codeAgentPortForModel(model)])), + executionModes: codeExecutionModes.map(executionModeInfo), + executionModeInfo: Object.fromEntries(codeExecutionModes.map((mode) => [mode, executionModeInfo(mode)])), + agentPorts: { codex: codeAgentPortInfo("codex"), opencode: codeAgentPortInfo("opencode") }, + defaultReasoningEffort: config.defaultReasoningEffort, + modelReasoningEfforts: config.modelReasoningEfforts, + defaultProviderId: config.defaultProviderId, + mainProviderId: config.defaultProviderId, + defaultWorkdir: config.defaultWorkdir, + remoteDefaultWorkdir: config.remoteDefaultWorkdir, + maxActiveQueues: null, + storage: { + primary: "postgres", + postgresConfigured: true, + postgresReady: schemaReady, + lastError: lastDatabaseError, + role: "master-control-plane", + mgrPoolMax: config.mgrDatabasePoolMax, + tracePoolMax: config.traceDatabasePoolMax, + }, + controlPlane: "master-code-queue-mgr", + }; +} + +async function loadAllTasksLite(): Promise { + const rows = await traceSql>>` + SELECT task_json + FROM unidesk_code_queue_tasks + ORDER BY created_at ASC, id ASC + LIMIT 2000 + `; + return rows.map(rowToTask); +} + +function taskTiming(task: QueueTask): JsonRecord { + const created = timestampMs(task.createdAt); + const started = timestampMs(task.startedAt); + const finished = timestampMs(task.finishedAt); + const updated = timestampMs(task.updatedAt); + const now = Date.now(); + return { + queuedMs: created === null ? null : (started ?? finished ?? updated ?? now) - created, + runMs: started === null ? null : (finished ?? now) - started, + totalElapsedMs: created === null ? null : (finished ?? now) - created, + durationMs: started === null ? null : (finished ?? now) - started, + }; +} + +function durationMsBetween(startAt: string | null | undefined, endAt: string | null | undefined): number | null { + const start = typeof startAt === "string" ? Date.parse(startAt) : NaN; + const end = typeof endAt === "string" ? Date.parse(endAt) : NaN; + if (!Number.isFinite(start) || !Number.isFinite(end)) return null; + return Math.max(0, end - start); +} + +function codexStatsDateKey(value: string | Date | null | undefined): string | null { + const date = value instanceof Date ? value : typeof value === "string" && value.length > 0 ? new Date(value) : null; + if (date === null || Number.isNaN(date.getTime())) return null; + const parts = codexStatsDateFormatter.formatToParts(date).reduce>((memo, part) => { + if (part.type !== "literal") memo[part.type] = part.value; + return memo; + }, {}); + const year = parts.year; + const month = parts.month; + const day = parts.day; + return year !== undefined && month !== undefined && day !== undefined ? `${year}-${month}-${day}` : null; +} + +function shiftDateKey(dateKey: string, offsetDays: number): string { + const [year = "1970", month = "01", day = "01"] = dateKey.split("-"); + const shifted = new Date(Date.UTC(Number(year), Number(month) - 1, Number(day) + offsetDays)); + return shifted.toISOString().slice(0, 10); +} + +function statsDaysFromUrl(url: URL): number { + const value = Number(url.searchParams.get("days") ?? 14); + return Number.isInteger(value) && value > 0 ? Math.min(90, value) : 14; +} + +function retryAttemptDates(task: QueueTask): Array { + const retryAttempts = task.attempts.filter((attempt, index) => attempt.mode === "retry" || Number(attempt.index || 0) > 1 || index > 0); + const fallbackRetryCount = Math.max(0, Math.floor(Number(task.currentAttempt || 0)) - 1); + const fallbackAt = timestampToIso(task.updatedAt) ?? timestampToIso(task.startedAt) ?? timestampToIso(task.createdAt); + return [ + ...retryAttempts.map((attempt) => timestampToIso(attempt.startedAt) ?? timestampToIso(attempt.finishedAt) ?? fallbackAt), + ...Array.from({ length: Math.max(0, fallbackRetryCount - retryAttempts.length) }, () => fallbackAt), + ]; +} + +function completedTaskDurationMs(task: QueueTask, finishedAt: string): number | null { + const startedAt = timestampToIso(task.startedAt) ?? timestampToIso(task.attempts[0]?.startedAt ?? null) ?? timestampToIso(task.createdAt); + return durationMsBetween(startedAt, finishedAt); +} + +function taskStatisticsSummary(tasks: QueueTask[], days = 14): JsonRecord { + const generatedAt = nowIso(); + const endDate = codexStatsDateKey(new Date()) ?? generatedAt.slice(0, 10); + const safeDays = Math.max(1, Math.min(90, Math.floor(days))); + const startDate = shiftDateKey(endDate, 1 - safeDays); + const buckets = new Map(); + for (let offset = 0; offset < safeDays; offset += 1) { + const date = shiftDateKey(startDate, offset); + buckets.set(date, { + date, + executedTasks: 0, + completedTasks: 0, + retryAttempts: 0, + succeededTasks: 0, + failedTasks: 0, + canceledTasks: 0, + totalDurationMs: 0, + durationSamples: 0, + }); + } + const bucketFor = (value: string | null): DailyTaskStatsBucket | null => { + const date = codexStatsDateKey(value); + return date === null ? null : buckets.get(date) ?? null; + }; + for (const task of tasks) { + const executedBucket = bucketFor(timestampToIso(task.startedAt) ?? timestampToIso(task.attempts[0]?.startedAt ?? null)); + if (executedBucket !== null) executedBucket.executedTasks += 1; + for (const retryAt of retryAttemptDates(task)) { + const retryBucket = bucketFor(retryAt); + if (retryBucket !== null) retryBucket.retryAttempts += 1; + } + if (!terminalTask(task)) continue; + const finishedAt = timestampToIso(task.finishedAt) ?? timestampToIso(task.updatedAt); + const completedBucket = bucketFor(finishedAt); + if (finishedAt === null || completedBucket === null) continue; + completedBucket.completedTasks += 1; + if (task.status === "succeeded") completedBucket.succeededTasks += 1; + if (task.status === "failed") completedBucket.failedTasks += 1; + if (task.status === "canceled") completedBucket.canceledTasks += 1; + const durationMs = completedTaskDurationMs(task, finishedAt); + if (durationMs !== null) { + completedBucket.totalDurationMs += durationMs; + completedBucket.durationSamples += 1; + } + } + const daily = Array.from(buckets.values()).map((bucket) => ({ + ...bucket, + avgDurationMs: bucket.durationSamples > 0 ? Math.round(bucket.totalDurationMs / bucket.durationSamples) : null, + })); + const totals = daily.reduce((memo, day) => { + memo.executedTasks += day.executedTasks; + memo.completedTasks += day.completedTasks; + memo.retryAttempts += day.retryAttempts; + memo.succeededTasks += day.succeededTasks; + memo.failedTasks += day.failedTasks; + memo.canceledTasks += day.canceledTasks; + memo.totalDurationMs += day.totalDurationMs; + memo.durationSamples += day.durationSamples; + return memo; + }, { + executedTasks: 0, + completedTasks: 0, + retryAttempts: 0, + succeededTasks: 0, + failedTasks: 0, + canceledTasks: 0, + totalDurationMs: 0, + durationSamples: 0, + }); + return { + generatedAt, + timezone: codexStatsTimeZone, + days: safeDays, + range: { startDate, endDate }, + totals: { + ...totals, + avgDurationMs: totals.durationSamples > 0 ? Math.round(totals.totalDurationMs / totals.durationSamples) : null, + }, + daily: toJsonValue(daily), + }; +} + +function taskListResponse(task: QueueTask, lite = true): JsonRecord { + const displayPrompt = task.basePrompt || userPromptForDisplay(task.prompt); + return { + id: task.id, + queueId: queueIdOf(task), + queueEnteredAt: task.queueEnteredAt, + prompt: lite ? prefixPreview(displayPrompt, 360) : safePreview(displayPrompt, 2000), + basePrompt: lite ? prefixPreview(task.basePrompt, 360) : safePreview(task.basePrompt, 2000), + displayPrompt: lite ? prefixPreview(displayPrompt, 360) : safePreview(displayPrompt, 2000), + promptChars: task.prompt.length, + basePromptChars: task.basePrompt.length, + displayPromptChars: displayPrompt.length, + promptEditable: queuedTaskPromptEditable(task), + finalResponseChars: task.finalResponse.length, + stepCount: numberField(task.stepCount ?? task.llmStepCount, 0), + llmStepCount: numberField(task.llmStepCount ?? task.stepCount, 0), + traceStats: null, + statsSource: "code-queue-mgr", + summaryOnly: true, + referenceTaskIds: task.referenceTaskIds, + referenceInjection: task.referenceInjection, + providerId: task.providerId, + executionMode: task.executionMode, + executionModeInfo: executionModeInfo(task.executionMode), + cwd: task.cwd, + model: task.model, + agentPort: codeAgentPortForModel(task.model), + agentPortInfo: codeAgentPortInfo(codeAgentPortForModel(task.model)), + reasoningEffort: task.reasoningEffort, + maxAttempts: task.maxAttempts, + status: task.status, + queuedReason: task.status === "queued" ? { code: "mgr-visible", label: "QUEUED", message: "Task is visible from master code-queue-mgr; D601 scheduler will pick it up from PostgreSQL." } : null, + queuedReasonLabel: task.status === "queued" ? "QUEUED" : null, + createdAt: task.createdAt, + updatedAt: task.updatedAt, + startedAt: task.startedAt, + finishedAt: task.finishedAt, + readAt: task.readAt, + currentAttempt: task.currentAttempt, + currentMode: task.currentMode, + judgeFailCount: task.judgeFailCount, + judgeFailRetryLimit: 3, + codexThreadId: task.codexThreadId, + activeTurnId: task.activeTurnId, + lastError: task.lastError, + lastJudge: task.lastJudge, + cancelRequested: task.cancelRequested, + terminalUnread: terminalTaskUnread(task), + outputCount: task.output.length, + eventCount: task.events.length, + attemptCount: task.attempts.length, + attempts: toJsonValue(lite ? task.attempts.slice(-3) : task.attempts), + timing: taskTiming(task), + }; +} + +function taskMetaResponse(task: QueueTask): JsonRecord { + return { + ...taskListResponse(task, false), + prompt: task.prompt, + basePrompt: task.basePrompt, + finalResponse: task.finalResponse, + promptHistory: toJsonValue(task.promptHistory), + attempts: toJsonValue(task.attempts), + nextMode: task.nextMode, + outputCount: task.output.length, + retainedOutputCount: task.output.length, + eventCount: task.events.length, + transcriptCount: task.output.length + task.promptHistory.length, + transcriptMaxSeq: outputMaxSeq(task), + transcript: [], + output: [], + events: [], + }; +} + +function lastAssistantMessage(task: QueueTask): JsonRecord { + const assistantOutput = task.output.slice().reverse().find((item) => item.channel === "assistant"); + const text = task.finalResponse.trim().length > 0 ? task.finalResponse.trim() : String(assistantOutput?.text ?? "").trim(); + return { + text, + at: assistantOutput?.at ?? task.finishedAt ?? task.updatedAt, + seq: assistantOutput?.seq ?? null, + source: task.finalResponse.trim().length > 0 ? "finalResponse" : assistantOutput !== undefined ? "output" : "none", + }; +} + +function taskSummary(task: QueueTask, url: URL): JsonRecord { + const toolLimit = Math.max(1, Math.min(500, numberField(url.searchParams.get("toolLimit"), 160))); + const toolOutputs = task.output.filter((item) => item.channel === "command" || item.channel === "tool" || item.channel === "diff" || item.channel === "error"); + const start = Math.max(0, toolOutputs.length - toolLimit); + return { + id: task.id, + queueId: queueIdOf(task), + status: task.status, + providerId: task.providerId, + executionMode: task.executionMode, + executionModeInfo: executionModeInfo(task.executionMode), + model: task.model, + agentPort: codeAgentPortForModel(task.model), + agentPortInfo: codeAgentPortInfo(codeAgentPortForModel(task.model)), + cwd: task.cwd, + reasoningEffort: task.reasoningEffort, + maxAttempts: task.maxAttempts, + currentAttempt: task.currentAttempt, + currentMode: task.currentMode, + judgeFailCount: task.judgeFailCount, + judgeFailRetryLimit: 3, + codexThreadId: task.codexThreadId, + activeTurnId: task.activeTurnId, + createdAt: task.createdAt, + startedAt: task.startedAt, + finishedAt: task.finishedAt, + updatedAt: task.updatedAt, + timing: taskTiming(task), + initialPrompt: task.prompt, + basePrompt: task.basePrompt, + prompt: task.prompt, + promptEditable: queuedTaskPromptEditable(task), + referenceTaskIds: task.referenceTaskIds, + referenceInjection: task.referenceInjection, + lastAssistantMessage: lastAssistantMessage(task), + toolSummary: { + count: toolOutputs.length, + returned: toolOutputs.length - start, + limit: toolLimit, + truncated: start > 0, + items: toolOutputs.slice(start).map((item) => ({ + seq: item.seq, + at: item.at, + kind: item.channel === "diff" ? "edited" : item.channel === "command" ? "ran" : item.channel === "tool" ? "explored" : "error", + title: item.method ?? item.channel, + status: item.method ?? null, + commandPreview: item.channel === "command" ? safePreview(item.text, 700) : "", + commandOmittedLines: 0, + outputPreview: item.channel === "command" ? "" : safePreview(item.text, 1200), + outputOmittedLines: 0, + rawSeqs: [item.seq], + })), + }, + attempts: toJsonValue(task.attempts), + lastJudge: task.lastJudge, + lastError: task.lastError, + cancelRequested: task.cancelRequested, + outputCount: task.output.length, + eventCount: task.events.length, + transcriptCount: task.output.length + task.promptHistory.length, + transcriptMaxSeq: outputMaxSeq(task), + }; +} + +function outputPage(task: QueueTask, url: URL): JsonRecord { + const limit = parseLimit(url); + const afterSeq = numberField(url.searchParams.get("afterSeq"), 0); + const beforeSeqRaw = url.searchParams.get("beforeSeq"); + const tail = url.searchParams.get("tail") === "1"; + const ordered = task.output.slice().sort((left, right) => left.seq - right.seq); + let rows = ordered; + if (beforeSeqRaw !== null) rows = ordered.filter((item) => item.seq < numberField(beforeSeqRaw, Number.MAX_SAFE_INTEGER)).slice(-limit); + else if (tail) rows = ordered.slice(-limit); + else rows = ordered.filter((item) => item.seq > afterSeq).slice(0, limit); + const firstSeq = rows[0]?.seq ?? null; + const lastSeq = rows.at(-1)?.seq ?? null; + const fullText = url.searchParams.get("fullText") === "1"; + const maxTextChars = Math.max(200, Math.min(100_000, numberField(url.searchParams.get("maxTextChars"), 6000))); + return { + ok: true, + taskId: task.id, + queueId: queueIdOf(task), + status: task.status, + updatedAt: task.updatedAt, + mode: beforeSeqRaw !== null ? "before" : tail ? "tail" : "after", + limit, + total: ordered.length, + maxSeq: outputMaxSeq(task), + afterSeq, + nextAfterSeq: lastSeq, + beforeSeq: beforeSeqRaw === null ? null : numberField(beforeSeqRaw), + previousBeforeSeq: firstSeq, + hasMore: lastSeq !== null && ordered.some((item) => item.seq > lastSeq), + hasBefore: firstSeq !== null && ordered.some((item) => item.seq < firstSeq), + output: rows.map((item) => ({ ...item, text: fullText ? item.text : safePreview(item.text, maxTextChars) })), + }; +} + +function transcriptKind(item: LiveOutput): string { + if (item.channel === "command") return "ran"; + if (item.channel === "diff") return "edited"; + if (item.channel === "tool") return "explored"; + if (item.channel === "error") return "error"; + if (item.channel === "system") return "system"; + return "message"; +} + +function outputToTranscriptLine(item: LiveOutput, fullText: boolean): JsonRecord { + const body = fullText ? item.text : safePreview(item.text, 1600); + return { + seq: item.seq, + at: item.at, + kind: transcriptKind(item), + title: item.method ?? item.channel, + status: item.method ?? null, + commandPreview: item.channel === "command" ? body : null, + bodyPreview: item.channel === "command" ? "" : body, + commandOmittedLines: 0, + bodyOmittedLines: 0, + rawSeqs: [item.seq], + }; +} + +function transcriptPage(task: QueueTask, url: URL): JsonRecord { + const limit = parseLimit(url); + const afterSeq = numberField(url.searchParams.get("afterSeq"), 0); + const beforeSeqRaw = url.searchParams.get("beforeSeq"); + const tail = url.searchParams.get("tail") === "1"; + const fullText = url.searchParams.get("fullText") === "1" || url.searchParams.get("raw") === "1"; + const transcript = task.output + .filter((item) => !(item.channel === "system" && item.method === "enqueue")) + .sort((left, right) => left.seq - right.seq) + .map((item) => outputToTranscriptLine(item, fullText)); + let rows = transcript; + if (beforeSeqRaw !== null) rows = transcript.filter((item) => Number(item.seq) < numberField(beforeSeqRaw, Number.MAX_SAFE_INTEGER)).slice(-limit); + else if (tail) rows = transcript.slice(-limit); + else rows = transcript.filter((item) => Number(item.seq) > afterSeq).slice(0, limit); + const firstSeq = Number(rows[0]?.seq ?? NaN); + const lastSeq = Number(rows.at(-1)?.seq ?? NaN); + return { + ok: true, + taskId: task.id, + queueId: queueIdOf(task), + status: task.status, + updatedAt: task.updatedAt, + agentPort: codeAgentPortForModel(task.model), + agentPortInfo: codeAgentPortInfo(codeAgentPortForModel(task.model)), + mode: beforeSeqRaw !== null ? "before" : tail ? "tail" : "after", + transcript: rows, + afterSeq, + nextAfterSeq: Number.isFinite(lastSeq) ? lastSeq : afterSeq, + beforeSeq: beforeSeqRaw === null ? null : numberField(beforeSeqRaw), + previousBeforeSeq: Number.isFinite(firstSeq) ? firstSeq : null, + hasMore: Number.isFinite(lastSeq) && transcript.some((item) => Number(item.seq) > lastSeq), + hasBefore: Number.isFinite(firstSeq) && transcript.some((item) => Number(item.seq) < firstSeq), + total: transcript.length, + maxSeq: transcript.at(-1)?.seq ?? 0, + fullText, + }; +} + +function traceSteps(task: QueueTask, url: URL): JsonRecord { + const limit = parseLimit(url); + const afterSeq = numberField(url.searchParams.get("afterSeq"), 0); + const beforeSeqRaw = url.searchParams.get("beforeSeq"); + const tail = url.searchParams.get("tail") === "1"; + const visible = task.output + .filter((item) => item.channel !== "system" || item.method !== "enqueue") + .sort((left, right) => left.seq - right.seq); + let rows = visible; + if (beforeSeqRaw !== null) rows = visible.filter((item) => item.seq < numberField(beforeSeqRaw, Number.MAX_SAFE_INTEGER)).slice(-limit); + else if (tail) rows = visible.slice(-limit); + else rows = visible.filter((item) => item.seq > afterSeq).slice(0, limit); + const firstSeq = rows[0]?.seq ?? null; + const lastSeq = rows.at(-1)?.seq ?? null; + return { + ok: true, + taskId: task.id, + source: "code-queue-mgr-postgres", + total: visible.length, + returned: rows.length, + limit, + afterSeq, + nextAfterSeq: lastSeq, + beforeSeq: beforeSeqRaw === null ? null : numberField(beforeSeqRaw), + previousBeforeSeq: firstSeq, + hasMore: lastSeq !== null && visible.some((item) => item.seq > lastSeq), + hasBefore: firstSeq !== null && visible.some((item) => item.seq < firstSeq), + steps: rows.map((item) => ({ + seq: item.seq, + at: item.at, + kind: item.channel === "command" ? "ran" : item.channel === "diff" ? "edited" : item.channel === "tool" ? "explored" : item.channel === "error" ? "error" : "message", + label: item.channel, + title: item.method ?? item.channel, + status: item.method ?? null, + bodyPreview: safePreview(item.text, 1600), + bodyOmittedLines: 0, + rawSeqs: [item.seq], + })), + }; +} + +function traceSummary(task: QueueTask): JsonRecord { + const steps = task.output.filter((item) => item.channel !== "system" || item.method !== "enqueue"); + return { + taskId: task.id, + queueId: queueIdOf(task), + status: task.status, + providerId: task.providerId, + executionMode: task.executionMode, + model: task.model, + stepCount: numberField(task.stepCount ?? task.llmStepCount, steps.length), + retainedStepCount: steps.length, + outputMaxSeq: outputMaxSeq(task), + statsSource: "code-queue-mgr-postgres", + attempts: task.attempts.map((attempt) => ({ + index: attempt.index, + mode: attempt.mode, + startedAt: attempt.startedAt, + finishedAt: attempt.finishedAt, + terminalStatus: attempt.terminalStatus, + finalResponsePreview: attempt.finalResponsePreview, + judge: attempt.judge ?? null, + feedbackPromptPreview: attempt.feedbackPromptPreview ?? null, + feedbackPromptChars: attempt.feedbackPromptChars ?? null, + })), + prompt: { + initialPreview: safePreview(task.prompt, 1600), + basePreview: safePreview(task.basePrompt, 1600), + chars: task.prompt.length, + lines: task.prompt.split(/\r?\n/u).length, + }, + lastAssistantMessage: lastAssistantMessage(task), + }; +} + +async function createTasks(req: Request): Promise { + if (!schemaReady) return jsonResponse({ ok: false, error: "code-queue-mgr database schema is not ready", schemaLastError }, 503); + const body = await readJson(req); + const record = asRecord(body) ?? {}; + const batchQueueId = typeof record.queueId === "string" && record.queueId.trim().length > 0 ? normalizeQueueId(record.queueId) : undefined; + const rawTasks = Array.isArray(record.tasks) ? record.tasks : [body]; + const tasks = await Promise.all(rawTasks.map(async (item) => { + const next = asRecord(item) === null ? item : { ...(asRecord(item) as Record) }; + if (batchQueueId !== undefined && asRecord(next)?.queueId === undefined) (next as Record).queueId = batchQueueId; + return await createTaskFromRequest(next); + })); + await mgrSql.begin(async (client) => { + const queues = new Map(); + for (const task of tasks) queues.set(queueIdOf(task), { id: queueIdOf(task), name: queueIdOf(task), createdAt: task.createdAt, updatedAt: task.updatedAt }); + for (const queue of queues.values()) await client` + INSERT INTO unidesk_code_queue_queues (id, name, created_at, updated_at) + VALUES (${queue.id}, ${queue.name}, ${queue.createdAt}, ${queue.updatedAt}) + ON CONFLICT (id) DO UPDATE SET updated_at = GREATEST(unidesk_code_queue_queues.updated_at, EXCLUDED.updated_at) + `; + for (const task of tasks) await upsertTask(client, task); + }); + log("info", "tasks_enqueued", { ids: tasks.map((task) => task.id), queueIds: Array.from(new Set(tasks.map(queueIdOf))) }); + return jsonResponse({ ok: true, tasks: tasks.map((task) => taskListResponse(task, false)), queue: await queueSummary() }, 202); +} + +async function createQueue(req: Request): Promise { + const body = asRecord(await readJson(req)) ?? {}; + const now = nowIso(); + const queueId = normalizeQueueId(body.queueId ?? body.id); + const queue = { id: queueId, name: normalizeQueueName(body.name, queueId), createdAt: now, updatedAt: now }; + await upsertQueue(queue); + return jsonResponse({ ok: true, queue, queues: (await queueSummary()).queues }, 201); +} + +async function updateQueue(queueIdValue: string, req: Request): Promise { + const queueId = normalizeQueueId(queueIdValue); + const body = asRecord(await readJson(req)) ?? {}; + const name = normalizeQueueName(body.name, queueId); + const rows = await mgrSql` + UPDATE unidesk_code_queue_queues + SET name = ${name}, updated_at = ${nowIso()} + WHERE id = ${queueId} + RETURNING id, name, created_at, updated_at + `; + if (rows[0] === undefined) return jsonResponse({ ok: false, error: "queue not found" }, 404); + return jsonResponse({ ok: true, queue: queueRowToRecord(rows[0]), queues: (await queueSummary()).queues }); +} + +async function mergeQueues(targetQueueIdValue: string | null, req: Request): Promise { + const body = asRecord(await readJson(req)) ?? {}; + const targetQueueId = normalizeQueueId(targetQueueIdValue ?? body.targetQueueId ?? body.intoQueueId ?? body.into); + const rawSource = body.sourceQueueIds ?? body.sources ?? body.sourceQueueId ?? body.fromQueueId ?? body.from ?? body.queueId ?? body.id; + const sourceQueueIds = (Array.isArray(rawSource) ? rawSource : typeof rawSource === "string" ? rawSource.split(/[,\s]+/u) : []) + .map((item) => normalizeQueueId(item)) + .filter((id) => id !== targetQueueId); + if (sourceQueueIds.length === 0) return jsonResponse({ ok: false, error: "sourceQueueId is required" }, 400); + const updatedAt = nowIso(); + await mgrSql.begin(async (client) => { + await client` + INSERT INTO unidesk_code_queue_queues (id, name, created_at, updated_at) + VALUES (${targetQueueId}, ${targetQueueId}, ${updatedAt}, ${updatedAt}) + ON CONFLICT (id) DO UPDATE SET updated_at = EXCLUDED.updated_at + `; + await client` + UPDATE unidesk_code_queue_tasks + SET + queue_id = ${targetQueueId}, + updated_at = ${updatedAt}, + task_json = jsonb_set(jsonb_set(task_json, '{queueId}', to_jsonb(${targetQueueId}::text), true), '{updatedAt}', to_jsonb(${updatedAt}::text), true) + WHERE queue_id IN ${client(sourceQueueIds)} + AND status NOT IN ('running', 'judging') + `; + await client`DELETE FROM unidesk_code_queue_queues WHERE id IN ${client(sourceQueueIds)}`; + }); + return jsonResponse({ ok: true, targetQueueId, sourceQueueIds, queue: await queueSummary() }, 202); +} + +async function markTaskRead(taskId: string): Promise { + const readAt = nowIso(); + const rows = await mgrSql>` + UPDATE unidesk_code_queue_tasks + SET + read_at = ${readAt}, + task_json = jsonb_set(task_json, '{readAt}', to_jsonb(${readAt}::text), true) + WHERE id = ${taskId} + AND status IN ('succeeded', 'failed', 'canceled') + AND read_at IS NULL + RETURNING id, status + `; + if (rows[0] === undefined) { + const task = await loadTask(taskId); + if (task === null) return jsonResponse({ ok: false, error: "task not found" }, 404); + if (!terminalTask(task)) return jsonResponse({ ok: false, error: `task is not terminal: ${task.status}` }, 409); + return jsonResponse({ ok: true, task: taskListResponse(task), queue: await queueSummary() }); + } + return jsonResponse({ ok: true, task: { id: taskId, readAt, terminalUnread: false }, queue: await queueSummary() }); +} + +async function markAllRead(url: URL): Promise { + const readAt = nowIso(); + const queueIdRaw = url.searchParams.get("queueId"); + const queueId = queueIdRaw === null || queueIdRaw.length === 0 ? null : safeQueueId(queueIdRaw); + const rows = await mgrSql>` + UPDATE unidesk_code_queue_tasks + SET + read_at = ${readAt}, + task_json = jsonb_set(task_json, '{readAt}', to_jsonb(${readAt}::text), true) + WHERE (${queueId === null} OR queue_id = ${queueId}) + AND status IN ('succeeded', 'failed', 'canceled') + AND read_at IS NULL + RETURNING id + `; + return jsonResponse({ ok: true, count: rows.length, readAt, queue: await queueSummary() }); +} + +async function moveTask(taskId: string, req: Request): Promise { + const body = asRecord(await readJson(req)) ?? {}; + const queueId = normalizeQueueId(body.queueId ?? body.id); + const movedAt = nowIso(); + const task = await mgrSql.begin(async (client): Promise => { + await client` + INSERT INTO unidesk_code_queue_queues (id, name, created_at, updated_at) + VALUES (${queueId}, ${queueId}, ${movedAt}, ${movedAt}) + ON CONFLICT (id) DO UPDATE SET updated_at = EXCLUDED.updated_at + `; + const rows = await client>>` + SELECT task_json + FROM unidesk_code_queue_tasks + WHERE id = ${taskId} + FOR UPDATE + `; + if (rows[0] === undefined) return null; + const nextTask = rowToTask(rows[0]); + if (nextTask.status === "running" || nextTask.status === "judging") return nextTask; + nextTask.queueId = queueId; + nextTask.queueEnteredAt = movedAt; + nextTask.updatedAt = movedAt; + nextTask.output.push({ seq: outputMaxSeq(nextTask) + 1, at: movedAt, channel: "system", text: `moved to queue=${queueId}\n`, method: "queue/move" }); + nextTask.outputMaxSeq = outputMaxSeq(nextTask); + await upsertTask(client, nextTask); + return nextTask; + }); + if (task === null) return jsonResponse({ ok: false, error: "task not found" }, 404); + if (task.status === "running" || task.status === "judging") return jsonResponse({ ok: false, error: `cannot move active task while status=${task.status}`, task: taskListResponse(task) }, 409); + return jsonResponse({ ok: true, task: taskListResponse(task, false), queue: await queueSummary() }, 202); +} + +async function retryTask(taskId: string, req: Request): Promise { + const body = asRecord(await readJson(req)) ?? {}; + const explicitPrompt = typeof body.prompt === "string" ? body.prompt.trim() : typeof body.continuePrompt === "string" ? body.continuePrompt.trim() : ""; + const queuedAt = nowIso(); + const result = await mgrSql.begin(async (client): Promise<{ task: QueueTask | null; changed: boolean }> => { + const rows = await client>>` + SELECT task_json + FROM unidesk_code_queue_tasks + WHERE id = ${taskId} + FOR UPDATE + `; + if (rows[0] === undefined) return { task: null, changed: false }; + const nextTask = rowToTask(rows[0]); + if (!terminalTask(nextTask)) return { task: nextTask, changed: false }; + nextTask.status = "queued"; + nextTask.finishedAt = null; + nextTask.readAt = null; + nextTask.cancelRequested = false; + nextTask.lastError = null; + nextTask.maxAttempts = Math.max(nextTask.maxAttempts, nextTask.attempts.length + 1); + nextTask.nextMode = "retry"; + nextTask.nextPrompt = explicitPrompt.length > 0 ? explicitPrompt : "Manual retry requested from master code-queue-mgr. Continue the original task and produce a visible final response."; + nextTask.updatedAt = queuedAt; + nextTask.queueEnteredAt = queuedAt; + nextTask.output.push({ seq: outputMaxSeq(nextTask) + 1, at: queuedAt, channel: "system", text: "manual retry queued by master code-queue-mgr\n", method: "manual-retry" }); + nextTask.outputMaxSeq = outputMaxSeq(nextTask); + await upsertTask(client, nextTask); + return { task: nextTask, changed: true }; + }); + const task = result.task; + if (task === null) return jsonResponse({ ok: false, error: "task not found" }, 404); + if (!result.changed) return jsonResponse({ ok: false, error: `task is not terminal: ${task.status}`, task: taskListResponse(task) }, 409); + return jsonResponse({ ok: true, task: taskListResponse(task, false), queue: await queueSummary() }, 202); +} + +async function editTask(taskId: string, req: Request): Promise { + const body = asRecord(await readJson(req)) ?? {}; + const editedAt = nowIso(); + const result = await mgrSql.begin(async (client): Promise<{ task: QueueTask | null; changed: boolean }> => { + const rows = await client>>` + SELECT task_json + FROM unidesk_code_queue_tasks + WHERE id = ${taskId} + FOR UPDATE + `; + if (rows[0] === undefined) return { task: null, changed: false }; + const nextTask = rowToTask(rows[0]); + if (!queuedTaskPromptEditable(nextTask)) return { task: nextTask, changed: false }; + const prepared = await preparePromptFromRecord(body, nextTask.referenceTaskIds); + if (prepared.referenceTaskIds.includes(nextTask.id)) throw new Error("a task cannot reference itself while editing prompt"); + nextTask.prompt = prepared.prompt; + nextTask.basePrompt = prepared.basePrompt; + nextTask.referenceTaskIds = prepared.referenceTaskIds; + nextTask.referenceInjection = prepared.referenceInjection; + nextTask.updatedAt = editedAt; + const enqueue = nextTask.output.find((item) => item.method === "enqueue"); + if (enqueue !== undefined) { + enqueue.at = editedAt; + enqueue.text = `${prepared.prompt}\n`; + } + nextTask.output.push({ seq: outputMaxSeq(nextTask) + 1, at: editedAt, channel: "system", text: "queued prompt edited by master code-queue-mgr\n", method: "prompt/edit" }); + nextTask.outputMaxSeq = outputMaxSeq(nextTask); + await upsertTask(client, nextTask); + return { task: nextTask, changed: true }; + }); + const task = result.task; + if (task === null) return jsonResponse({ ok: false, error: "task not found" }, 404); + if (!result.changed) return jsonResponse({ ok: false, error: `task prompt can only be edited before first run while status=queued; current status=${task.status}`, task: taskListResponse(task) }, 409); + return jsonResponse({ ok: true, changed: true, editable: true, task: taskListResponse(task, false), queue: await queueSummary() }); +} + +async function tasksOverview(url: URL): Promise { + const { tasks, total, limit, offset } = await loadTasksForList(url); + const queues = await loadQueues(); + const summary = await queueSummary(undefined, queues); + const selectedId = url.searchParams.get("preferId") || tasks[0]?.id || ""; + const selectedTask = selectedId.length > 0 ? await loadTask(selectedId) : null; + const transcriptLimit = Math.max(1, Math.min(500, numberField(url.searchParams.get("transcriptLimit"), 80))); + const afterSeq = numberField(url.searchParams.get("afterSeq"), 0); + const selected = selectedTask === null ? null : { + task: taskMetaResponse(selectedTask), + transcript: selectedTask.output.filter((item) => item.seq > afterSeq).slice(0, transcriptLimit).map((item) => ({ + seq: item.seq, + at: item.at, + kind: item.channel === "command" ? "ran" : item.channel === "diff" ? "edited" : item.channel === "tool" ? "explored" : item.channel === "error" ? "error" : "message", + title: item.method ?? item.channel, + status: item.method ?? null, + bodyPreview: safePreview(item.text, 1200), + rawSeqs: [item.seq], + })), + afterSeq, + nextAfterSeq: selectedTask.output.at(-1)?.seq ?? afterSeq, + hasMore: false, + preview: true, + total: selectedTask.output.length, + maxSeq: outputMaxSeq(selectedTask), + }; + return jsonResponse({ + ok: true, + queue: summary, + statistics: { days: 14, source: "code-queue-mgr", skipped: true }, + tasks: tasks.map((task) => taskListResponse(task, true)), + selected, + pagination: { + limit, + returned: tasks.length, + total, + hasMore: offset + tasks.length < total, + nextBeforeId: null, + beforeId: null, + includeActive: false, + }, + }); +} + +function httpError(status: number, error: string, detail: JsonRecord = {}): HttpErrorDetail { + return { status, body: { ok: false, error, ...detail } }; +} + +function routeActiveControl(pathname: string, method: string): HttpErrorDetail | null { + if (/^\/api\/tasks\/[^/]+\/(?:steer|interrupt)$/u.test(pathname)) { + return httpError(409, "active run control remains on D601 scheduler", { method, schedulerPath: pathname }); + } + if (/^\/api\/tasks\/[^/]+$/u.test(pathname) && method === "DELETE") { + return httpError(409, "active run interrupt remains on D601 scheduler", { method, schedulerPath: pathname }); + } + return null; +} + +async function route(req: Request): Promise { + const url = new URL(req.url); + if (req.method === "OPTIONS") return new Response(null, { status: 204 }); + try { + if (url.pathname === "/" || url.pathname === "/health" || url.pathname === "/live") { + const taskCount = schemaReady ? Number((await traceSql>`SELECT COUNT(*) AS count FROM unidesk_code_queue_tasks`)[0]?.count ?? 0) : null; + return jsonResponse({ + ok: schemaReady, + service: "code-queue-mgr", + role: "master-control-plane", + startedAt: serviceStartedAt, + schemaReady, + schemaLastError, + taskCount, + resourceBudget: { + targetMemoryMb: 100, + mgrPoolMax: config.mgrDatabasePoolMax, + tracePoolMax: config.traceDatabasePoolMax, + noRunnerDependencies: true, + noPlaywright: true, + noCodexRuntime: true, + noDockerSocket: true, + }, + endpoints: { + control: ["/api/queues", "/api/tasks", "/api/tasks/:id/(move|retry|read|edit)"], + traceRead: ["/api/tasks/overview", "/api/tasks/:id/summary", "/api/tasks/:id/trace-summary", "/api/tasks/:id/trace-steps", "/api/tasks/:id/output"], + }, + }, schemaReady ? 200 : 503); + } + if (url.pathname === "/logs" && req.method === "GET") return jsonResponse({ ok: true, logs: recentLogs.slice(-100) }); + const activeControl = routeActiveControl(url.pathname, req.method); + if (activeControl !== null) return jsonResponse(activeControl.body, activeControl.status); + if (url.pathname === "/api/queues" && req.method === "GET") { + const tasks = await loadAllTasksLite(); + const queues = await loadQueues(); + return jsonResponse({ ok: true, queues: (await queueSummary(tasks, queues)).queues, queue: await queueSummary(tasks, queues) }); + } + if (url.pathname === "/api/queues" && req.method === "POST") return await createQueue(req); + if (url.pathname === "/api/queues/merge" && req.method === "POST") return await mergeQueues(null, req); + const queueMergeMatch = url.pathname.match(/^\/api\/queues\/([^/]+)\/merge$/u); + if (queueMergeMatch !== null && req.method === "POST") return await mergeQueues(decodeURIComponent(queueMergeMatch[1] ?? ""), req); + const queueMatch = url.pathname.match(/^\/api\/queues\/([^/]+)$/u); + if (queueMatch !== null && (req.method === "PATCH" || req.method === "PUT" || req.method === "POST")) return await updateQueue(decodeURIComponent(queueMatch[1] ?? ""), req); + if (url.pathname === "/api/tasks/overview" && req.method === "GET") return await tasksOverview(url); + if (url.pathname === "/api/tasks/stats" && req.method === "GET") { + const queueIdRaw = url.searchParams.get("queueId"); + const queueId = queueIdRaw === null || queueIdRaw.length === 0 ? null : safeQueueId(queueIdRaw); + const allTasks = await loadAllTasksLite(); + const statsTasks = queueId === null ? allTasks : allTasks.filter((task) => queueIdOf(task) === queueId); + return jsonResponse({ ok: true, statistics: taskStatisticsSummary(statsTasks, statsDaysFromUrl(url)), queue: await queueSummary(allTasks) }); + } + if (url.pathname === "/api/tasks" && req.method === "GET") { + const page = await loadTasksForList(url); + return jsonResponse({ ok: true, tasks: page.tasks.map((task) => taskListResponse(task, url.searchParams.get("lite") === "1")), queue: await queueSummary(), pagination: { limit: page.limit, offset: page.offset, total: page.total, returned: page.tasks.length, hasMore: page.offset + page.tasks.length < page.total } }); + } + if ((url.pathname === "/api/tasks" || url.pathname === "/api/tasks/batch") && req.method === "POST") return await createTasks(req); + if (url.pathname === "/api/tasks/read-all" && req.method === "POST") return await markAllRead(url); + const outputMatch = url.pathname.match(/^\/api\/tasks\/([^/]+)\/output$/u); + if (outputMatch !== null && req.method === "GET") { + const task = await loadTask(decodeURIComponent(outputMatch[1] ?? "")); + return task === null ? jsonResponse({ ok: false, error: "task not found" }, 404) : jsonResponse(outputPage(task, url)); + } + const transcriptMatch = url.pathname.match(/^\/api\/tasks\/([^/]+)\/transcript$/u); + if (transcriptMatch !== null && req.method === "GET") { + const task = await loadTask(decodeURIComponent(transcriptMatch[1] ?? "")); + return task === null ? jsonResponse({ ok: false, error: "task not found" }, 404) : jsonResponse(transcriptPage(task, url)); + } + const traceSummaryMatch = url.pathname.match(/^\/api\/tasks\/([^/]+)\/trace-summary$/u); + if (traceSummaryMatch !== null && req.method === "GET") { + const task = await loadTask(decodeURIComponent(traceSummaryMatch[1] ?? "")); + return task === null ? jsonResponse({ ok: false, error: "task not found" }, 404) : jsonResponse({ ok: true, summary: traceSummary(task) }); + } + const traceStepsMatch = url.pathname.match(/^\/api\/tasks\/([^/]+)\/trace-steps$/u); + if (traceStepsMatch !== null && req.method === "GET") { + const task = await loadTask(decodeURIComponent(traceStepsMatch[1] ?? "")); + return task === null ? jsonResponse({ ok: false, error: "task not found" }, 404) : jsonResponse(traceSteps(task, url)); + } + const traceStepMatch = url.pathname.match(/^\/api\/tasks\/([^/]+)\/trace-step$/u); + if (traceStepMatch !== null && req.method === "GET") { + const task = await loadTask(decodeURIComponent(traceStepMatch[1] ?? "")); + const seq = numberField(url.searchParams.get("seq"), 0); + const item = task?.output.find((output) => output.seq === seq) ?? null; + return task === null || item === null ? jsonResponse({ ok: false, error: "trace step not found" }, 404) : jsonResponse({ ok: true, taskId: task.id, step: item, rawOutput: item }); + } + const summaryMatch = url.pathname.match(/^\/api\/tasks\/([^/]+)\/summary$/u); + if (summaryMatch !== null && req.method === "GET") { + const task = await loadTask(decodeURIComponent(summaryMatch[1] ?? "")); + return task === null ? jsonResponse({ ok: false, error: "task not found" }, 404) : jsonResponse({ ok: true, summary: taskSummary(task, url) }); + } + const promptMatch = url.pathname.match(/^\/api\/tasks\/([^/]+)\/prompt$/u); + if (promptMatch !== null && req.method === "GET") { + const task = await loadTask(decodeURIComponent(promptMatch[1] ?? "")); + if (task === null) return jsonResponse({ ok: false, error: "task not found" }, 404); + const part = url.searchParams.get("part") ?? "initial"; + const text = part === "base" ? task.basePrompt : task.prompt; + return jsonResponse({ ok: true, taskId: task.id, part, text, chars: text.length, lines: text.split(/\r?\n/u).length }); + } + if (promptMatch !== null && req.method === "PATCH") return await editTask(decodeURIComponent(promptMatch[1] ?? ""), req); + const taskActionMatch = url.pathname.match(/^\/api\/tasks\/([^/]+)(?:\/(retry|move|read|edit))?$/u); + if (taskActionMatch !== null) { + const taskId = decodeURIComponent(taskActionMatch[1] ?? ""); + const action = taskActionMatch[2] ?? ""; + if (action === "retry" && req.method === "POST") return await retryTask(taskId, req); + if (action === "move" && req.method === "POST") return await moveTask(taskId, req); + if (action === "read" && req.method === "POST") return await markTaskRead(taskId); + if (action === "edit" && (req.method === "POST" || req.method === "PATCH")) return await editTask(taskId, req); + if (action.length === 0 && req.method === "GET") { + const task = await loadTask(taskId); + if (task === null) return jsonResponse({ ok: false, error: "task not found" }, 404); + if (url.searchParams.get("meta") === "1") return jsonResponse({ ok: true, task: taskMetaResponse(task) }); + return jsonResponse({ ok: true, task: { ...task, ...taskMetaResponse(task), output: task.output, events: task.events } }); + } + } + return jsonResponse({ ok: false, error: "not found", path: url.pathname }, 404); + } catch (error) { + lastDatabaseError = error instanceof Error ? error.message : String(error); + log("error", "request_failed", { path: url.pathname, method: req.method, error: errorToJson(error) }); + if (error instanceof SyntaxError) return jsonResponse({ ok: false, error: "invalid JSON request body", detail: error.message }, 400); + return jsonResponse({ ok: false, error: error instanceof Error ? error.message : String(error) }, 500); + } +} + +void ensureSchemaWithRetry(); +Bun.serve({ hostname: config.host, port: config.port, fetch: route, idleTimeout: 60 }); +log("info", "service_listening", { port: config.port, role: "master-control-plane", mgrPoolMax: config.mgrDatabasePoolMax, tracePoolMax: config.traceDatabasePoolMax }); diff --git a/src/components/microservices/code-queue-mgr/tsconfig.json b/src/components/microservices/code-queue-mgr/tsconfig.json new file mode 100644 index 00000000..5d5f23f2 --- /dev/null +++ b/src/components/microservices/code-queue-mgr/tsconfig.json @@ -0,0 +1,18 @@ +{ + "compilerOptions": { + "composite": true, + "target": "ES2022", + "module": "ESNext", + "moduleResolution": "Bundler", + "types": ["bun", "node"], + "strict": true, + "noImplicitReturns": true, + "noFallthroughCasesInSwitch": true, + "declaration": true, + "emitDeclarationOnly": true, + "outDir": "dist", + "skipLibCheck": true + }, + "include": ["src/**/*.ts"], + "references": [{ "path": "../../shared" }] +}