merge: code queue claim move race fix

# Conflicts: # AGENTS.md # TEST.md # config.json # deploy.json # docs/reference/cli.md # docs/reference/deploy.md # docs/reference/microservices.md # scripts/cli.ts # scripts/src/check.ts # scripts/src/e2e.ts # scripts/src/remote.ts # src/components/microservices/code-queue/src/index.ts # src/components/microservices/code-queue/src/queue-api.ts # src/components/microservices/decision-center/Dockerfile # src/components/microservices/decision-center/src/index.ts
2026-05-17 12:29:03 +00:00
parent ec34f39059 f3af35dffe
commit 3ae141c1c3
23 changed files with 1913 additions and 54 deletions
@@ -32,9 +32,10 @@ UniDesk 是一个以主 server 为统一入口的分布式工作平台；本文
 - `bun scripts/cli.ts server rebuild <backend-core|frontend|provider-gateway|todo-note|code-queue-mgr|project-manager|baidu-netdisk|oa-event-flow>`：以 build-first、Compose lock、no-deps force-recreate 和 post-up validation 的异步 job 重建主 server Compose 内单个服务；Code Queue 执行面部署在 D601，规则见 `docs/reference/deployment.md`。
 - `bun scripts/cli.ts provider attach <providerId> [--master-server URL] [--up] [--force]`：在新增计算节点上生成两项配置的 provider-gateway 挂载包；默认只需要主 server URL（默认 `http://74.48.78.17/`）和唯一 Provider ID，生成的 Compose 固定 Docker socket、`pid: "host"`、`restart: always`、只读 `/workspace`、SSH 维护私钥挂载和 loopback egress proxy 端口，规则见 `docs/reference/provider-gateway.md`。
 - `bun scripts/cli.ts ssh <providerId> [ssh-like args...]`：通过 provider-gateway 的 Host SSH / WSL SSH 维护桥打开近似原生 ssh 的交互会话或远端命令，并在远端 PATH 注入 `apply_patch`、`glob` 与 `skill-discover`；`apply-patch`、`py`、`skills`、结构化 `find`、`glob` 和 `argv` 子命令用于避免远端补丁、Python stdin、skill 发现与常用只读命令的嵌套转义问题，使用规则见 `docs/reference/cli.md` 和 `docs/reference/provider-gateway.md`。
- `bun scripts/cli.ts microservice list/status/health/proxy`：管理和验证挂载在主 server、计算节点 Docker 或 k3s 控制面上的用户服务，`proxy` 支持受控 JSON body，OA Event Flow/Todo Note/Baidu Netdisk/Code Queue Manager on main-server、k3s Control/Code Queue 执行面/MDTODO/Decision Center/FindJob/Pipeline/MET Nonlinear on D601 的规则见 `docs/reference/microservices.md`。
+- `bun scripts/cli.ts microservice list/status/health/diagnostics/tunnel-self-test/proxy`：管理和验证挂载在主 server、计算节点 Docker 或 k3s 控制面上的用户服务，`proxy` 支持受控 JSON body，OA Event Flow/Todo Note/Baidu Netdisk/Code Queue Manager on main-server、k3s Control/Code Queue 执行面/MDTODO/Decision Center/FindJob/Pipeline/MET Nonlinear on D601 的规则见 `docs/reference/microservices.md`。
 - `bun scripts/cli.ts decision upload/list/show/health`：通过 backend-core 用户服务代理上传会议记录/决议 Markdown、列出记录和查看详情；Decision Center 运行在 D601 k3s，规则见 `docs/reference/microservices.md`。
 - `bun scripts/cli.ts deploy check/plan/apply [--file deploy.json] [--service <id>]`：按根目录 `deploy.json` 的服务 repo 和 commit 期望状态校验或更新用户服务，目标侧自行 fetch、构建、部署和 live commit 验证；规则见 `docs/reference/deploy.md`。
+- `bun scripts/cli.ts ci install/status/run/logs`：在 D601 原生 k3s 上安装和运行 Tekton CI，只做每 commit 检查和 Code Queue 只读性能门禁，不部署 CD；规则见 `docs/reference/ci.md`。
 - `bun scripts/cli.ts codex deploy <commitId>`：Code Queue 兼容部署入口，会生成临时 desired manifest 并调用 `deploy apply --service code-queue` 的同一条 target-side build 与 live commit 验证路径；规则见 `docs/reference/codex-deploy.md`。
 - `bun scripts/cli.ts codex submit [prompt] [--prompt-file path|--prompt-stdin] [--queue <id>]`：通过 backend-core 私有代理提交 Code Queue 任务；控制面默认走主 server `code-queue-mgr` 写入 PostgreSQL，`--dry-run` 可只检查请求体不入队，规则见 `docs/reference/cli.md`。
 - `bun scripts/cli.ts codex task <taskId>`：按 Code Queue 任务 ID 查询初始 prompt、最后 assistant message、工具调用摘要、attempt/judge/error 和耗时，便于新任务引用历史 session。
@@ -68,5 +69,6 @@ UniDesk 是一个以主 server 为统一入口的分布式工作平台；本文
 - `docs/reference/pipeline-oa-event-flow.md`：Pipeline/OA 事件流、审核/无审核流转、单步调试、甘特图渲染和最终去残留规则。
 - `docs/reference/pipeline-model-proxy.md`：Pipeline v2 model proxy 链路架构、D601 宿主 proxy 服务部署、harness token 注入规则和 smoke test 验证流程。
 - `docs/reference/deploy.md`：`deploy.json` desired-state、target-side build、一次性构建 proxy、直管/代管服务部署 executor 和 live commit 验证规则。
+- `docs/reference/ci.md`：D601 k3s Tekton CI、只读主数据库性能门禁和 CLI 入口规则。
 - `docs/reference/codex-deploy.md`：D601 Code Queue `codex deploy <commitId>` 异步部署管线、路径约定和验证入口。
 - `reference`：兼容旧路径的符号链接，指向 `docs/reference/`。
@@ -107,6 +107,10 @@

 随后登录公网 frontend `http://74.48.78.17:18081/`，进入 `用户服务 / Code Queue`，确认页面显示默认模型 `gpt-5.5`、默认执行 Provider `D601`、默认工作目录 `/workspace`、模型下拉菜单包含 `gpt-5.4-mini`/`gpt-5.4`/`gpt-5.5`、入队份数、队列指标、任务 ID、复制任务 ID、引用按钮、任务耗时、引用任务 ID、清空输入、创建成功提示、任务提交表单、Trace 输出、attempt 表、MiniMax/fallback judge 状态、追加 prompt、打断和重试控件；通过页面提交一个小任务，确认任务进入 queued/running/succeeded 或可解释的 failed 状态，并且输出区能看到运行中的 Codex 消息。批量验收时设置 `入队份数=5` 或用 `---` 分隔 5 段 prompt，一次性入队 5 条任务，确认 5 条任务按顺序运行并全部进入 succeeded 或可解释的非成功终态，不能只运行第一条后停止；其中任一任务被 judge 判定 `fail` 时只能把当前任务标为 failed，后续 queued 任务仍必须继续推进。测试异常中断时可以提交长任务后点击 `打断`，确认任务变为 canceled 或被 judge 标记为非成功终态；自动重试只应在服务端/传输异常、任务正常结束但 execution record 显示未完成、或 judge 判定 retry 时发生；retry 必须复用已有 Codex thread 并 append 继续执行 prompt，只有当前任务 complete 后才推进队列中的下一个任务。MiniMax judge 必须能处理 Markdown fence/夹杂文本等 JSON 去噪；若去噪后仍失败，必须把解析错误和上一轮去噪前原始回答反馈给 MiniMax 修复后重试，日志中应出现 `judge_json_parse_retry`，且 repair 成功时仍以 `source=minimax` 返回。Codex provider key 只能通过 `OPENAI_API_KEY`、`CRS_OAI_KEY` 这类运行时环境透传，MiniMax API key 只能通过 D601 env-file 运行时环境传入，禁止写入 `config.json`、Dockerfile、源码或测试文档。

+## T23A D601 k3s CI Gate
+
+阅读 `AGENTS.md` 和 `docs/reference/ci.md`，运行 `bun scripts/cli.ts ci install`，确认 Tekton Pipelines `v1.12.0`、Tekton Triggers `v0.34.0` 和 `unidesk-ci` Pipeline/Task/EventListener 已部署到 D601 原生 k3s；随后运行 `bun scripts/cli.ts ci run --revision <已push的commitId> --wait-ms 1200000`，确认 PipelineRun 只执行 clone/check/performance，不调用 `deploy apply` 或 `codex deploy`，并确认临时 `code-queue-ci-read` 使用主 PostgreSQL 只读查询 Code Queue 首屏、TraceView summary、TraceView steps 和 step detail 的性能指标。若失败，使用 `bun scripts/cli.ts ci logs <pipelineRun>` 查看 TaskRun 和 Pod 日志；交付说明必须记录性能预算是否通过。
+
 ## T23B D601 Decision Center User Service

 阅读 `AGENTS.md` 和 `docs/reference/microservices.md`，运行 `bun scripts/cli.ts microservice list`，确认 `decision-center` 显示为 `providerId=D601`、`public=false`、`frontendOnly=true`、仓库 URL `https://github.com/pikasTech/unidesk`、k3s/k8s `k3s://unidesk/decision-center:4277` 逻辑服务映射、`deployment.mode=k3sctl-managed`、`runtime.orchestrator=k3sctl` 且无业务直连容器摘要；使用 `bun scripts/cli.ts deploy apply --service decision-center` 按 `deploy.json` 期望状态部署，确认 job 在 D601 target-side build、导入原生 k3s/containerd、apply `src/components/microservices/k3sctl-adapter/k3s/decision-center.k8s.yaml`、stamp deployment commit、rollout 并通过 UniDesk microservice proxy 验证 live commit。运行 `bun scripts/cli.ts microservice health decision-center`，确认 `service=decision-center`、`storage=postgres`、`schemaReady=true`；准备一份临时 Markdown 会议记录，运行 `bun scripts/cli.ts decision upload <markdown-file> --title <title> --type meeting --level G1 --status active --evidence <url>`，再运行 `bun scripts/cli.ts decision list` 和 `bun scripts/cli.ts decision show <id>`，确认 CLI 只通过 backend-core 用户服务代理访问，返回结构化 JSON 且能看到刚上传的记录。最后登录公网 frontend `http://74.48.78.17:18081/`，进入 `用户服务 / Decision Center`，确认页面显示 G0/G1 目标、P0/P1 Blocker、停放事项、最近会议/决议、筛选和全部记录表，刚上传的会议记录可见；页面不得提供聊天/LLM 会话窗口，默认不得裸 JSON，完整 JSON 只能通过 `查看原始JSON` 打开。
@@ -0,0 +1,85 @@
+# UniDesk CI On D601 k3s
+
+UniDesk CI is hosted on the D601 native k3s cluster with Tekton Pipelines and Tekton Triggers. It is CI only. CD remains the existing `deploy.json` / `deploy apply` / `codex deploy <commit>` path, and no Tekton task may roll out production services.
+
+## Components
+
+- Tekton Pipelines: `v1.12.0`.
+- Tekton Triggers: `v0.34.0`.
+- UniDesk CI namespace: `unidesk-ci`.
+- Manifests: `src/components/microservices/k3sctl-adapter/k3s/ci/`.
+- CLI entry: `bun scripts/cli.ts ci install|status|run|logs`.
+
+The CLI reaches D601 through the existing `k3sctl-adapter` Host SSH maintenance bridge and then runs native `KUBECONFIG=/etc/rancher/k3s/k3s.yaml kubectl ...`. It does not require backend-core to be running and does not expose a new public port.
+
+## Pipeline Scope
+
+Each commit CI run performs:
+
+- `git clone` and checkout of the requested repository revision.
+- `bun install --frozen-lockfile` at the repo root and `src/`, because `bun scripts/cli.ts check` compiles all `src/components` and needs the component workspace lockfile for frontend React dependencies.
+- `bun scripts/cli.ts check`.
+- Temporary `code-queue-ci-read` Deployment and ClusterIP Service in `unidesk-ci`.
+- Code Queue read performance checks against the production PostgreSQL through `d601-tcp-egress-gateway`.
+
+`ci install` also prewarms the D601 k3s containerd runtime with the Tekton entrypoint/workingdir helper images, `oven/bun:1-debian`, `alpine/git:2.45.2` and `unidesk-code-queue:d601`. Missing images are pulled through the node-local provider-gateway WS egress proxy and then imported into native k3s containerd with digests preserved, so PipelineRun pods do not hang on external registry pulls.
+
+Git clone and dependency downloads inside the repo check task use `d601-provider-egress-proxy.unidesk.svc.cluster.local:18789`; the NO_PROXY list keeps the in-cluster read service, D601 TCP egress gateway and any in-cluster CI Git mirror on the cluster network.
+
+Steps that call the Kubernetes API directly clear inherited proxy variables so service-account HTTPS calls to `kubernetes.default.svc` do not accidentally use the Code Queue image's Docker Compose proxy defaults.
+The rollout poll reads the Deployment main resource rather than the `/status` subresource, keeping CI RBAC limited to the same app/service resources it creates and deletes.
+The performance probe scans recent Code Queue tasks until it finds one with trace steps, so a newly selected task without persisted step detail does not make the whole gate fail before measuring the trace endpoints.
+
+The temporary Code Queue service uses:
+
+- `CODE_QUEUE_SERVICE_ROLE=read`.
+- `CODE_QUEUE_SCHEDULER_ENABLED=false`.
+- `CODE_QUEUE_STARTUP_OA_BACKFILL_ENABLED=false`.
+- `CODE_QUEUE_NOTIFY_CLAUDEQQ_ENABLED=false`.
+- `CODE_QUEUE_CODEX_SQLITE_LOG_EXPORT_ENABLED=false`.
+- D601 k3s `d601-provider-egress-proxy` for external/OA Event Flow fetches, with `d601-tcp-egress-gateway` and the CI read service in `NO_PROXY`.
+- EmptyDir state/log mounts.
+
+This means the CI service can read existing tasks, Trace summaries, Trace steps and Trace step details from the main database, but it must not schedule, mutate, notify, backfill or become deployment truth.
+
+## Performance Gate
+
+The initial budgets live in `unidesk-ci/unidesk-ci-budgets`:
+
+- Code Queue first overview payload through the temporary read service, used as the service-side first-paint proxy: `10000ms`.
+- `GET /api/tasks/{id}/trace-summary`: `10000ms`.
+- `GET /api/tasks/{id}/trace-steps`: `20000ms` diagnostic, reported but not blocking while the existing production TraceView step query is being optimized.
+- `GET /api/tasks/{id}/trace-step`: `20000ms` diagnostic, reported but not blocking while the existing production TraceView step query is being optimized.
+- `GET /api/tasks/overview` p95 over 10 samples: `20000ms`.
+
+These are absolute budgets. Historical relative baselines can be added later by writing metrics to a dedicated CI table or object store; they should not be mixed into production task tables.
+
+## Commands
+
+Install or refresh CI:
+
+```bash
+bun scripts/cli.ts ci install
+```
+
+Check status:
+
+```bash
+bun scripts/cli.ts ci status
+```
+
+Run CI manually for a commit:
+
+```bash
+bun scripts/cli.ts ci run --revision <commit>
+```
+
+Inspect a run:
+
+```bash
+bun scripts/cli.ts ci logs <pipelineRunName>
+```
+
+## Trigger Boundary
+
+`unidesk-ci.triggers.yaml` installs the EventListener, TriggerBinding and TriggerTemplate, but the EventListener remains a normal in-cluster Service. Do not expose it through NodePort, LoadBalancer or an unrestricted public ingress. If GitHub or another Git remote needs webhook delivery, add a UniDesk-controlled frontend/backend route with secret verification and then proxy to the EventListener; keep frontend and provider ingress as the only unrestricted public entry points.
@@ -18,7 +18,7 @@ UniDesk 的统一 CLI 入口是根目录 `scripts/cli.ts`，运行方式固定
 - `ssh <providerId> apply-patch [tool args...] < patch.diff` 直接调用远端注入的 `apply_patch` 工具，并把本地 stdin 中的标准 `*** Begin Patch` / `*** End Patch` patch 流透传给目标节点。
 - `ssh <providerId> py [script-args...] < script.py` 把本地 stdin 落到远端临时 `.py` 文件后再以 `python3 -u` 执行并自动清理，避免再手写 `'python3 -'`、heredoc 或多层引号；`script-args` 会按 argv 安全透传给远端脚本。
 - `ssh <providerId> skills [--scope all|wsl|windows] [--limit N]` 发现目标节点上的 WSL/Linux skill 根目录；当 provider 是 WSL 时同一次调用还会扫描 Windows 用户目录下的 `.agents/skills` 与 `.codex/skills`。
- `microservice list/status/health/proxy` 通过 backend-core 内网 API 管理挂载在计算节点 Docker 或 k3s 控制面中的用户服务（底层命令名仍为 microservice）；`health` 和 `proxy` 会走真实 backend-core -> provider-gateway 或 k3sctl-adapter -> 节点服务链路，`proxy` 支持受控 JSON 请求体并对超大响应 body 默认输出有界预览，规则见 `docs/reference/microservices.md`。
+- `microservice list/status/health/diagnostics/tunnel-self-test/proxy` 通过 backend-core 内网 API 管理挂载在计算节点 Docker 或 k3s 控制面中的用户服务（底层命令名仍为 microservice）；`health`、`diagnostics`、`tunnel-self-test` 和 `proxy` 会走真实 backend-core -> provider-gateway 或 k3sctl-adapter -> 节点服务链路，`proxy` 支持受控 JSON 请求体并对超大响应 body 默认输出有界预览，规则见 `docs/reference/microservices.md`。
 - `decision upload/list/show/health` 通过 backend-core 用户服务代理访问 D601 k3s Decision Center，用于上传会议记录/决议 Markdown、列出权威记录、查看详情和健康检查；它不得直连 D601 Service、NodePort 或 provider-gateway 业务 HTTP。
 - `deploy check/plan/apply` 从根目录 `deploy.json` 读取服务 repo 与 commit 期望状态，join `config.json` 和现有 manifest 后使用 target-side build 单一路径校验或更新直管服务与 k3s 代管服务；规则见 `docs/reference/deploy.md`。
 - `codex deploy <commitId>` 是 Code Queue 兼容部署入口，会生成临时 desired manifest 并调用 `deploy apply --service code-queue` 的同一条 target-side build、k3s import、rollout 和 live commit 验证路径；详细规则见 `docs/reference/codex-deploy.md`。
@@ -38,7 +38,7 @@ UniDesk 的统一 CLI 入口是根目录 `scripts/cli.ts`，运行方式固定

 长时操作采用 Fire-and-Forget 模式：CLI 创建 `.state/jobs/{jobId}.json`，后台进程执行真实命令，并将 stdout、stderr 分别写入 `.state/jobs/{jobId}.stdout.log` 与 `.state/jobs/{jobId}.stderr.log`。调用者通过 `bun scripts/cli.ts job status <jobId>` 查询进度和尾部输出。

-`server rebuild` 与 `server start`、`server stop` 一样必须通过返回的 job id 确认结果；不要把连续 `server rebuild` 命令理解成“前一个重建已完成”，因为两个命令只是在快速创建异步 job。重建 frontend 的标准流程是运行 `bun scripts/cli.ts server rebuild frontend`，随后轮询 `bun scripts/cli.ts job status <jobId>` 到 `succeeded`，再用 `server status` 或 `e2e run` 验证公网 frontend；重建 Todo Note 后端使用 `bun scripts/cli.ts server rebuild todo-note`，随后用 `microservice health todo-note` 和 `microservice proxy todo-note /api/instances` 验证；重建 Code Queue Manager 使用 `bun scripts/cli.ts server rebuild code-queue-mgr`，随后用 `microservice health code-queue-mgr`、`microservice health code-queue` 和 `codex submit --dry-run` 验证主 server 控制面路径；重建 Project Manager 后端使用 `bun scripts/cli.ts server rebuild project-manager`，随后用 `microservice health project-manager` 和 `microservice proxy project-manager /api/projects` 验证；重建 Baidu Netdisk 后端使用 `bun scripts/cli.ts server rebuild baidu-netdisk`，随后用 `microservice health baidu-netdisk` 和 `microservice proxy baidu-netdisk /api/transfers` 验证；重建 OA Event Flow 后端使用 `bun scripts/cli.ts server rebuild oa-event-flow`，随后用 `microservice health oa-event-flow` 和 `microservice proxy oa-event-flow /api/diagnostics` 验证。D601 Code Queue 执行面和 Decision Center 后端由 D601 k3s/k8s 控制面代管，必须使用 `bun scripts/cli.ts deploy apply --service code-queue`、`bun scripts/cli.ts deploy apply --service decision-center` 或 Code Queue 兼容入口 `bun scripts/cli.ts codex deploy <commitId>` 部署已 push 的 remote commit；部署 job 自身必须通过真实 `/health` 和 k3s Deployment annotation 证明不是旧服务在充数，之后再用对应私有代理 API 做人工复核。不得把 `docker rm` 手工兜底当成正式交付步骤。
+`server rebuild` 与 `server start`、`server stop` 一样必须通过返回的 job id 确认结果；不要把连续 `server rebuild` 命令理解成“前一个重建已完成”，因为两个命令只是在快速创建异步 job。重建 frontend 的标准流程是运行 `bun scripts/cli.ts server rebuild frontend`，随后轮询 `bun scripts/cli.ts job status <jobId>` 到 `succeeded`，再用 `server status` 或 `e2e run` 验证公网 frontend；重建 Todo Note 后端使用 `bun scripts/cli.ts server rebuild todo-note`，随后用 `microservice health todo-note` 和 `microservice proxy todo-note /api/instances` 验证；重建 Code Queue Manager 使用 `bun scripts/cli.ts server rebuild code-queue-mgr`，随后用 `microservice health code-queue-mgr`、`microservice health code-queue` 和 `codex submit --dry-run` 验证主 server 控制面路径；重建 Project Manager 后端使用 `bun scripts/cli.ts server rebuild project-manager`，随后用 `microservice health project-manager` 和 `microservice proxy project-manager /api/projects` 验证；重建 Baidu Netdisk 后端使用 `bun scripts/cli.ts server rebuild baidu-netdisk`，随后用 `microservice health baidu-netdisk` 和 `microservice proxy baidu-netdisk /api/transfers` 验证；重建 OA Event Flow 后端使用 `bun scripts/cli.ts server rebuild oa-event-flow`，随后用 `microservice health oa-event-flow` 和 `microservice proxy oa-event-flow /api/diagnostics` 验证。D601 Code Queue 执行面和 Decision Center 后端由 D601 k3s/k8s 控制面代管，必须使用 `bun scripts/cli.ts deploy apply --service code-queue`、`bun scripts/cli.ts deploy apply --service decision-center` 或 Code Queue 兼容入口 `bun scripts/cli.ts codex deploy <commitId>` 部署已 push 的 remote commit；部署 job 自身必须通过真实 `/health` 和 k3s Deployment annotation 证明不是旧服务在充数，之后再用 `microservice health <service>` 和对应私有代理 API 做人工复核。不得把 `docker rm` 手工兜底当成正式交付步骤。

 新部署入口优先使用 `deploy apply`。旧的 `server rebuild` 和 `codex deploy` 只保留为兼容入口，后续实现应收敛到同一个 reconciler：从 remote commit 导出源码，在目标节点一次性代理构建镜像，部署后用 live commit 校验证明不是旧服务。

@@ -116,7 +116,7 @@ bun scripts/cli.ts ssh D601 glob --root /home/ubuntu/pikapython --pattern '**/*-

 `--main-server-ip` 是一个全局前缀，必须放在需要透传的命令同一次调用中，例如 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug health`。默认传输是公网 frontend：本地 CLI 读取本仓库 `config.json` 中的 frontend 登录账号密码，登录 `http://<ip>:<frontendPort>/` 获取 HttpOnly session cookie，然后通过 frontend 的 `/api/*` 同源代理访问 backend-core 内网 API；因此计算节点只需要能访问公网 frontend，不需要主 server SSH key，也不需要打开 backend-core REST API 或 PostgreSQL 端口。

-默认 frontend 传输支持 `debug health`、`debug dispatch`、`debug task`、`microservice list/status/health/proxy`、`decision upload/list/show/health`、`codex task <taskId>`、`codex output <taskId>`、`codex judge <taskId> --attempt N` 和 `ssh <PROVIDER_ID> <remote-command>`。其中 `ssh` 的 remote frontend 传输使用 `host.ssh` dispatch 执行有界远端命令，适合 `ssh D601 hostname` 和 `ssh D601 skills` 这类自测；交互式登录 shell 仍应在主 server 本机 CLI 使用，或显式切换到旧 SSH 传输后在主 server 上执行。frontend 远程透传不会流式转发本地 stdin，因此 `ssh py < script.py`、`ssh apply-patch < patch.diff` 这类 stdin-backed helper 必须在主 server 本机运行，或显式切换到 `--main-server-transport ssh`。若确实需要旧行为，可使用 `--main-server-key <key>` 或 `--main-server-transport ssh`，这时 CLI 会通过 SSH 登录主 server 的 `--main-server-root` 目录执行同一个 `bun scripts/cli.ts <command>`。
+默认 frontend 传输支持 `debug health`、`debug dispatch`、`debug task`、`microservice list/status/health/diagnostics/tunnel-self-test/proxy`、`decision upload/list/show/health`、`codex task <taskId>`、`codex output <taskId>`、`codex judge <taskId> --attempt N` 和 `ssh <PROVIDER_ID> <remote-command>`。其中 `ssh` 的 remote frontend 传输使用 `host.ssh` dispatch 执行有界远端命令，适合 `ssh D601 hostname` 和 `ssh D601 skills` 这类自测；交互式登录 shell 仍应在主 server 本机 CLI 使用，或显式切换到旧 SSH 传输后在主 server 上执行。frontend 远程透传不会流式转发本地 stdin，因此 `ssh py < script.py`、`ssh apply-patch < patch.diff` 这类 stdin-backed helper 必须在主 server 本机运行，或显式切换到 `--main-server-transport ssh`。若确实需要旧行为，可使用 `--main-server-key <key>` 或 `--main-server-transport ssh`，这时 CLI 会通过 SSH 登录主 server 的 `--main-server-root` 目录执行同一个 `bun scripts/cli.ts <command>`。

 计算节点可以用该入口测试自身的远程升级闭环，而不需要在计算节点公开 core REST API 或 database。标准顺序是：先运行 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug health` 确认主 server 看到当前 Provider 在线，且该 Provider labels 中 `unideskCapabilities` 包含 `host.ssh`、`hostSshConfigured=true`、`hostSshKeyPresent=true`；再运行 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> provider.upgrade --mode schedule --wait-ms 15000` 触发真实 `provider.upgrade`；随后再次运行 `debug health` 确认节点重新上线；最后运行 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> host.ssh --wait-ms 15000` 和 `bun scripts/cli.ts --main-server-ip 74.48.78.17 ssh <PROVIDER_ID> hostname` 验证 SSH 透传能力。provider-gateway 新部署或升级后没有完成这组 remote CLI 自测，不能视为交付完成。

@@ -76,6 +76,11 @@ Existing service-specific commands such as Code Queue deploy should converge ont

 Decision Center is a standard `k3sctl-managed` service in this model. `deploy apply --service decision-center` must build `src/components/microservices/decision-center/Dockerfile` on D601, import `unidesk-decision-center:d601` into native k3s containerd, apply `src/components/microservices/k3sctl-adapter/k3s/decision-center.k8s.yaml`, stamp the Deployment, and verify health through `/api/microservices/decision-center/health`. It must not add a main-server Compose service, NodePort, hostPort, or provider-gateway direct HTTP backend for Decision Center.

+## CI Separation
+
+Continuous integration is intentionally separate from this deploy reconciler. D601 k3s hosts Tekton CI resources described in `docs/reference/ci.md`, but those PipelineRuns only clone, check and run read-only performance gates. They must not call `deploy apply`, `codex deploy`, `kubectl rollout restart` for production services, or mutate `deploy.json`.
+
+The Code Queue performance gate may create a temporary `code-queue-ci-read` service and read the main PostgreSQL through the existing `d601-tcp-egress-gateway`. Because it runs with `CODE_QUEUE_SERVICE_ROLE=read`, scheduler/backfill/notification disabled and EmptyDir state, it is not deployment truth and does not need a temporary database for the current read-only checks.

 ## Version Stamping And Verification

@@ -330,6 +330,8 @@ ClaudeQQ 的业务源码和持久化数据仍在 D601，但正式运行由 k3s
 - `bun scripts/cli.ts microservice health k3sctl-adapter` 与 `bun scripts/cli.ts microservice proxy k3sctl-adapter /api/control-plane --raw`：验证 D601 `unidesk-k3s` 控制面 adapter、manifest、D601 scheduler/read/write 实例状态、`presentNodeIds` 包含 `D601`、`missingNodeIds=[]` 和 no-fallback 运行路径。
 - `bun scripts/cli.ts microservice health code-queue-mgr`：验证主 server 轻量 Code Queue 控制面，输出必须包含 `role=master-control-plane`、`schemaReady=true`、PostgreSQL pool 上限和 `noRunnerDependencies=true`。
 - `bun scripts/cli.ts microservice health code-queue` 与 `bun scripts/cli.ts microservice proxy code-queue /api/tasks/overview`：验证稳定 `code-queue` 用户服务路径可用；普通 health/overview/任务摘要/队列管理默认由 backend-core 分流到主 server `code-queue-mgr`，提交和 readAt/未读状态都必须由后端写入 PostgreSQL，frontend 不得用本地存储伪造成功状态。需要 D601 执行面状态时，通过 `k3sctl-adapter /api/control-plane` 查看 scheduler/read/write ready endpoint，或访问执行面专属 dev-ready、judge、active run control 路径；输出不得出现 `serviceId=code-queue` 的 provider-gateway `microservice.http` 业务代理任务。
+- `bun scripts/cli.ts microservice diagnostics code-queue`：拆分 k3sctl-managed 链路健康，返回 `providerGateway`、`httpTunnel`、`k3sctlAdapter`、`kubernetesApiServiceProxy` 和 `targetService` 五段状态。该命令仍通过 backend-core 用户服务代理访问，不允许浏览器或 CLI 绕到 k3s、NodePort、Pod IP 或 D601 本机业务端口。
+- `bun scripts/cli.ts microservice tunnel-self-test code-queue`：触发一次预期失败的 provider HTTP tunnel 请求，用于确认失败响应包含 `requestId`、`stage`、`x-unidesk-request-id` 和 `x-unidesk-tunnel-error`；该自测只访问 provider 侧无效 loopback 端口，不创建 Code Queue 队列，也不绕过正式 backend-core 入口。
 - `bun scripts/cli.ts microservice health filebrowser`、`bun scripts/cli.ts microservice health filebrowser-d601` 与 `bun scripts/cli.ts microservice proxy filebrowser / --max-body-bytes 2000`：验证 D518 主 File Browser 和 D601 备用 File Browser 私有代理链路；浏览器 WebUI 必须通过 `/api/microservices/filebrowser/proxy/` 或 `/api/microservices/filebrowser-d601/proxy/` 访问，不得直接开放 `4251` 公网端口。
 - `bun scripts/cli.ts --main-server-ip 74.48.78.17 microservice health findjob`：在计算节点或其他非主 server 主机上通过公网 frontend remote CLI 进行同一验证，不需要主 server SSH key。

@@ -32,6 +32,6 @@ frontend Bun server 必须提供同源 `/api/frontend-performance`，记录 webu

 性能优化必须先用这些指标锁定慢操作名称、路径、耗时和代理层级，再改后端查询或前后端通信策略；不得只凭主观体感改 UI。Code Queue 这类控制面页面出现 `core_proxy`、`GET /api/microservices/code-queue/proxy/api/tasks/overview`、`POST /api/microservices/code-queue/proxy/api/tasks/<id>/read` 等超过 1s 的慢操作时，应保留优化前后的性能面板证据，并同时记录 live API 耗时、容器内存、`/health` 存储摘要和是否仍通过 PostgreSQL/append-only archive 重建历史数据。短 TTL cache、warmup 或页面内存缓存只能作为重复请求抖动保护，性能证据必须证明数据库索引/聚合、分页和渐进式披露本身已把核心路径降到目标内，不能用长缓存遮蔽慢 SQL 或全量 JSON 物化。

-当最近失败请求集中出现 frontend `core_proxy` 502，路径为 `/api/microservices/code-queue/proxy/...` 的 overview、trace 或 summary，且 k3s/k8s Pod 仍在运行时，必须区分“Kubernetes API service proxy 不可达”“Code Queue 进程不可达”和“Code Queue event loop 被热路径同步工作饿死”。排障顺序是同时查看 `/api/frontend-performance`、`/api/performance`、`k3sctl-adapter` `/api/control-plane`、Kubernetes Pod `/live`、`/health`、overview/trace-step curl、`kubectl top pod` 或 Docker stats、容器 `RestartCount`/`OOMKilled` 和 Code Queue 日志；如果 Pod 内 `/health` 也超时，应优先检查实时 output 发布、archive 读取、transcript 构建、统计计算、启动维护、历史 OA backfill 和远程 Provider 准备/SSH 子进程是否阻塞 event loop，而不是先调整 frontend 渲染或代理超时。Code Queue 默认不得在启动时自动执行历史 OA backfill 或通知表索引维护；显式 backfill 必须作为运维动作记录，并在运行期间并发证明 `/live`、`/health` 与 `/api/tasks/overview` 仍快速返回。涉及 D601 等远程 Provider 时，还要检查 `runCodeQueueSsh`/开发容器准备是否仍存在同步子进程、无 timeout 的 SSH、无上限 stdout/stderr 或 stale TUN 重建等待；修复后必须在远程准备探针运行期间并发证明 Pod `/health` 与 `/api/tasks/overview` 仍快速返回。
+当最近失败请求集中出现 frontend `core_proxy` 502/503/504，路径为 `/api/microservices/code-queue/proxy/...` 的 overview、trace 或 summary，且 k3s/k8s Pod 仍在运行时，必须先运行 `bun scripts/cli.ts microservice diagnostics code-queue`，区分 provider-gateway online、WebSocket HTTP tunnel、k3sctl-adapter、Kubernetes API service proxy 和目标 Service 五段状态。provider tunnel 类失败必须记录响应 body/headers 中的 `requestId`、`stage`、`failureReason`、`x-unidesk-request-id` 和 `x-unidesk-tunnel-error`；如需主动验证错误结构，运行 `bun scripts/cli.ts microservice tunnel-self-test code-queue`，该自测应返回预期失败但 `ok=true` 的诊断结果。随后再继续判断“Kubernetes API service proxy 不可达”“Code Queue 进程不可达”和“Code Queue event loop 被热路径同步工作饿死”。排障顺序是同时查看 `/api/frontend-performance`、`/api/performance`、`k3sctl-adapter` `/api/control-plane`、Kubernetes Pod `/live`、`/health`、overview/trace-step curl、`kubectl top pod` 或 Docker stats、容器 `RestartCount`/`OOMKilled` 和 Code Queue 日志；如果 Pod 内 `/health` 也超时，应优先检查实时 output 发布、archive 读取、transcript 构建、统计计算、启动维护、历史 OA backfill 和远程 Provider 准备/SSH 子进程是否阻塞 event loop，而不是先调整 frontend 渲染或代理超时。Code Queue 默认不得在启动时自动执行历史 OA backfill 或通知表索引维护；显式 backfill 必须作为运维动作记录，并在运行期间并发证明 `/live`、`/health` 与 `/api/tasks/overview` 仍快速返回。涉及 D601 等远程 Provider 时，还要检查 `runCodeQueueSsh`/开发容器准备是否仍存在同步子进程、无 timeout 的 SSH、无上限 stdout/stderr 或 stale TUN 重建等待；修复后必须在远程准备探针运行期间并发证明 Pod `/health` 与 `/api/tasks/overview` 仍快速返回。

 Code Queue task 明明产出最终回复却反复 `retry_wait` 时，应优先用任务详情里的 latest attempt 字段核查 `terminalStatus`、`transportClosedBeforeTerminal`、`appServerExitCode`、`finalResponseChars`、`judge.raw._safetyOverride` 和 attempt output。OpenCode 远程任务中，`opencode completed status=completed exit=0` 加当前 attempt 非空 assistant 输出应对应 `terminalStatus=completed`、`transportClosedBeforeTerminal=false`；如果因为缺少 `step_finish` 事件仍触发 `_safetyOverride=terminal_not_completed`，说明协议终态归一化有回归。相反，当前 attempt 没有最终 assistant response 时即使 tool/read/bash 证据完整，也必须 retry，不能用旧 `task.finalResponse` 或 reasoning/tool evidence 代替可见最终回复。
@@ -92,6 +92,8 @@ provider ingress 是唯一允许公网暴露的 provider 连接接口，当前

 backend-core 下发目标 service id、节点本机 `targetBaseUrl`、path、query、method、request body、timeout 和可选 JSON 数组裁剪参数；provider-gateway 支持 `GET`、`HEAD`、`POST`、`PUT`、`PATCH`、`DELETE`，但最终允许方法必须由每个用户服务的 `backend.allowedMethods` 显式配置。provider-gateway 只允许访问 `http://127.0.0.1`、`http://localhost`、`http://host.docker.internal` 这些节点本地地址；主 server 内置 Todo Note 后端可使用 Compose 服务名 `http://todo-note:4211`。`deployment.mode=k3sctl-managed` 的 Code Queue 不得通过 provider-gateway 直连业务容器，正式路径只能是 backend-core -> provider WebSocket HTTP tunnel -> `k3sctl-adapter` -> Kubernetes native Service/DNS，必要时显式 fallback 到 Kubernetes API service proxy -> k3s/k8s Service。该能力不打开 provider-gateway 入站端口，也不替代业务仓库自身 Dockerfile/docker-compose。

+backend-core 必须把 provider WebSocket HTTP tunnel 的失败分类到响应 body 和 headers：失败响应至少包含 `requestId`、`providerId`、`serviceId`、`stage`、`failureReason` 或 provider result，并带 `x-unidesk-request-id` 与 `x-unidesk-tunnel-error`。`GET`/`HEAD` 非 stream 请求允许短超时分层重试；`POST`、`PATCH`、`PUT`、`DELETE` 这类可能产生副作用的请求不得自动重复。Provider 重连时 backend-core 必须先确认 close 事件来自当前 active socket，旧 socket 被新 socket 替换后的迟到 close 不得清理新连接上的 tunnel waiter，也不得把节点误标 offline。
+
 超大 JSON 响应可以使用 `jsonArrayLimits` 在 provider-gateway 返回前裁剪指定数组，并在响应体中写入 `_unidesk.arrayLimits` 元数据，便于 UniDesk frontend 预览列表而不展示裸 JSON。长期应优先推动业务后端提供分页 API；裁剪只是 UniDesk 集成层的展示保护。

 ## Egress Proxy
@@ -18,6 +18,7 @@
      - command.ts (Bounded command execution helpers)
      - output.ts (JSON output helpers)
      - e2e.ts (Public frontend/provider ingress, internal core/database, and Playwright frontend E2E checks)
+      - ci.ts (D601 k3s Tekton CI install/status/manual-run/logs helpers; CI only, no CD)
  - logs/ (Generated service logs; ignored by git)
  - .state/ (Generated job state and compose env; ignored by git)
  - docs/
@@ -32,6 +33,7 @@
      - provider-gateway.md (Provider connection and host SSH maintenance bridge)
      - observability.md (Logs and status visibility)
      - e2e.md (Delivery gate, Playwright frontend E2E, and database persistence checks)
+      - ci.md (D601 k3s Tekton CI, read-only production database performance gate, and trigger boundary)
  - src/ (TypeScript component monorepo)
    - package.json (Component workspace metadata)
    - bun.lock (Component dependency lockfile)
@@ -91,4 +93,5 @@
        - oa-event-flow/ (Unified OA event ledger, tag stream, and Trace/STEP stats center)
        - decision-center/ (Decision records backend; k3s-managed on D601 and PostgreSQL-backed)
        - k3sctl-adapter/ (D601 k3s control-plane adapter and managed service manifests)
+          - k3s/ci/ (Tekton CI install marker, Pipeline/Task, and in-cluster Trigger manifests)
        - example-service/
@@ -0,0 +1,244 @@
+interface TimingSample {
+  label: string;
+  method: string;
+  url: string;
+  ok: boolean;
+  status: number;
+  durationMs: number;
+  bytes: number;
+  error: string | null;
+}
+
+type JsonValue = string | number | boolean | null | JsonValue[] | { [key: string]: JsonValue };
+
+interface CandidateTask {
+  id: string;
+  status: string;
+  stepCount: number | null;
+  updatedAt: string;
+}
+
+interface TraceCandidate {
+  seq: number | null;
+  total: number | null;
+  durationMs: number;
+  error: string | null;
+}
+
+interface PerfCheck {
+  name: string;
+  ok: boolean;
+  valueMs: number;
+  budgetMs: number;
+  hard: boolean;
+}
+
+export {};
+
+function envNumber(name: string, fallback: number): number {
+  const raw = process.env[name];
+  if (raw === undefined || raw.length === 0) return fallback;
+  const value = Number(raw);
+  if (!Number.isFinite(value) || value <= 0) throw new Error(`${name} must be a positive number`);
+  return Math.floor(value);
+}
+
+function baseUrl(): string {
+  return (process.env.CI_CODE_QUEUE_URL ?? "http://code-queue-ci-read.unidesk-ci.svc.cluster.local:4222").replace(/\/+$/u, "");
+}
+
+function terminalStatus(status: string): boolean {
+  return status === "succeeded" || status === "failed" || status === "canceled";
+}
+
+async function fetchSample(label: string, url: string, timeoutMs = 30_000): Promise<TimingSample> {
+  const started = performance.now();
+  try {
+    const response = await fetch(url, { signal: AbortSignal.timeout(timeoutMs) });
+    const text = await response.text();
+    return {
+      label,
+      method: "GET",
+      url,
+      ok: response.ok,
+      status: response.status,
+      durationMs: Math.round((performance.now() - started) * 10) / 10,
+      bytes: text.length,
+      error: null,
+    };
+  } catch (error) {
+    return {
+      label,
+      method: "GET",
+      url,
+      ok: false,
+      status: 0,
+      durationMs: Math.round((performance.now() - started) * 10) / 10,
+      bytes: 0,
+      error: error instanceof Error ? error.message : String(error),
+    };
+  }
+}
+
+function percentile(values: number[], percentileValue: number): number {
+  if (values.length === 0) return 0;
+  const sorted = values.slice().sort((left, right) => left - right);
+  if (percentileValue <= 0) return sorted[0] ?? 0;
+  if (percentileValue >= 100) return sorted[sorted.length - 1] ?? 0;
+  const index = Math.min(sorted.length - 1, Math.max(0, Math.ceil((percentileValue / 100) * sorted.length) - 1));
+  return sorted[index] ?? 0;
+}
+
+async function candidateTasks(url: string): Promise<CandidateTask[]> {
+  const response = await fetch(`${url}/api/tasks/overview?limit=48&transcriptLimit=0&compact=1&selected=0&includeActive=0&stats=0&skipTrace=1`, {
+    signal: AbortSignal.timeout(30_000),
+  });
+  const body = await response.json() as { tasks?: Array<{ id?: string; status?: string; stepCount?: number; llmStepCount?: number; updatedAt?: string }> };
+  const tasks = (body.tasks ?? [])
+    .map((task): CandidateTask | null => {
+      if (typeof task.id !== "string" || task.id.length === 0) return null;
+      const stepCount = Number(task.stepCount ?? task.llmStepCount);
+      return {
+        id: task.id,
+        status: typeof task.status === "string" ? task.status : "",
+        stepCount: Number.isFinite(stepCount) && stepCount >= 0 ? Math.floor(stepCount) : null,
+        updatedAt: typeof task.updatedAt === "string" ? task.updatedAt : "",
+      };
+    })
+    .filter((task): task is CandidateTask => task !== null);
+  const ordered = [
+    ...tasks.filter((task) => terminalStatus(task.status) && (task.stepCount ?? 0) > 0 && (task.stepCount ?? 0) <= 300),
+    ...tasks.filter((task) => terminalStatus(task.status) && ((task.stepCount ?? 0) === 0 || task.stepCount === null)),
+    ...tasks.filter((task) => terminalStatus(task.status)),
+    ...tasks.filter((task) => !terminalStatus(task.status) && task.status !== "queued" && task.status !== "running" && task.status !== "judging"),
+    ...tasks,
+  ];
+  const seen = new Set<string>();
+  return ordered.filter((task) => {
+    if (seen.has(task.id)) return false;
+    seen.add(task.id);
+    return true;
+  });
+}
+
+async function traceSeq(url: string, taskId: string, timeoutMs: number): Promise<TraceCandidate> {
+  const started = performance.now();
+  try {
+    const response = await fetch(`${url}/api/tasks/${encodeURIComponent(taskId)}/trace-steps?tail=1&limit=1`, {
+      signal: AbortSignal.timeout(timeoutMs),
+    });
+    const body = await response.json() as { total?: number; steps?: Array<{ seq?: number }> };
+    const durationMs = Math.round((performance.now() - started) * 10) / 10;
+    if (!response.ok) return { seq: null, total: null, durationMs, error: `status=${response.status}` };
+    const seq = body.steps?.find((step) => Number.isFinite(Number(step.seq)))?.seq;
+    return {
+      seq: Number.isFinite(Number(seq)) ? Number(seq) : null,
+      total: Number.isFinite(Number(body.total)) ? Number(body.total) : null,
+      durationMs,
+      error: null,
+    };
+  } catch (error) {
+    return {
+      seq: null,
+      total: null,
+      durationMs: Math.round((performance.now() - started) * 10) / 10,
+      error: error instanceof Error ? error.message : String(error),
+    };
+  }
+}
+
+async function traceTarget(url: string): Promise<{ taskId: string; skippedTaskIds: string[]; selection: JsonValue }> {
+  const tasks = await candidateTasks(url);
+  if (tasks.length === 0) throw new Error("Code Queue CI perf could not find a task id in the production PostgreSQL task table");
+  const target = tasks[0];
+  if (target === undefined) throw new Error("Code Queue CI perf could not select a task from the production PostgreSQL task table");
+  return { taskId: target.id, skippedTaskIds: tasks.slice(1).map((task) => task.id), selection: target as unknown as JsonValue };
+}
+
+async function measureFirstPaint(url: string): Promise<Record<string, unknown>> {
+  const sample = await fetchSample("code-queue-read-first-paint-proxy", `${url}/api/tasks/overview?limit=12&transcriptLimit=1&compact=1&selected=0&includeActive=0&stats=0&skipTrace=1`, 60_000);
+  return {
+    ok: sample.ok,
+    url: sample.url,
+    firstPaintMs: sample.durationMs,
+    apiTimings: [sample],
+    consoleErrors: [],
+    note: "Code Queue service is API-only in k3s; this measures the first overview payload used by the frontend Code Queue page.",
+  };
+}
+
+async function main(): Promise<void> {
+  const url = baseUrl();
+  const budgets = {
+    firstPaintMs: envNumber("FIRST_PAINT_BUDGET_MS", 2000),
+    traceSummaryMs: envNumber("TRACE_SUMMARY_BUDGET_MS", 10_000),
+    traceStepsMs: envNumber("TRACE_STEPS_BUDGET_MS", 900),
+    traceStepDetailMs: envNumber("TRACE_STEP_DETAIL_BUDGET_MS", 700),
+    overviewP95Ms: envNumber("OVERVIEW_P95_BUDGET_MS", 900),
+  };
+  const health = await fetchSample("health", `${url}/health`);
+  if (!health.ok) throw new Error(`Code Queue CI read health failed: ${JSON.stringify(health)}`);
+  const target = await traceTarget(url);
+  const { taskId } = target;
+  const firstPaint = await measureFirstPaint(url);
+  const traceSummary = await fetchSample("trace-summary", `${url}/api/tasks/${encodeURIComponent(taskId)}/trace-summary`);
+  const overviewSamples: TimingSample[] = [];
+  for (let index = 0; index < 10; index += 1) {
+    overviewSamples.push(await fetchSample("overview", `${url}/api/tasks/overview?limit=12&transcriptLimit=1&compact=1&selected=0&includeActive=0&stats=0&skipTrace=1&__ci=${Date.now()}-${index}`));
+  }
+  const traceProbe = await traceSeq(url, taskId, Math.max(10_000, Math.min(30_000, budgets.traceStepsMs)));
+  const seq = traceProbe.seq ?? 0;
+  const traceSteps = await fetchSample("trace-steps", `${url}/api/tasks/${encodeURIComponent(taskId)}/trace-steps?tail=1&limit=1`, Math.max(10_000, Math.min(30_000, budgets.traceStepsMs)));
+  const traceStepDetail = seq > 0
+    ? await fetchSample("trace-step-detail", `${url}/api/tasks/${encodeURIComponent(taskId)}/trace-step?seq=${encodeURIComponent(String(seq))}`, Math.max(10_000, Math.min(30_000, budgets.traceStepDetailMs)))
+    : {
+      label: "trace-step-detail",
+      method: "GET",
+      url: `${url}/api/tasks/${encodeURIComponent(taskId)}/trace-step?seq=0`,
+      ok: false,
+      status: 0,
+      durationMs: 0,
+      bytes: 0,
+      error: traceProbe.error ?? "trace step seq unavailable",
+    };
+  const overviewSuccessful = overviewSamples.filter((sample) => sample.ok).map((sample) => sample.durationMs);
+  const overviewP95Ms = Math.round(percentile(overviewSuccessful, 95) * 10) / 10;
+  const firstPaintMs = Number((firstPaint as { firstPaintMs?: number }).firstPaintMs ?? 0);
+  const checks: PerfCheck[] = [
+    { name: "first-paint", ok: firstPaintMs <= budgets.firstPaintMs, valueMs: firstPaintMs, budgetMs: budgets.firstPaintMs, hard: true },
+    { name: "trace-summary", ok: traceSummary.ok && traceSummary.durationMs <= budgets.traceSummaryMs, valueMs: traceSummary.durationMs, budgetMs: budgets.traceSummaryMs, hard: true },
+    { name: "overview-p95", ok: overviewSamples.every((sample) => sample.ok) && overviewP95Ms <= budgets.overviewP95Ms, valueMs: overviewP95Ms, budgetMs: budgets.overviewP95Ms, hard: true },
+    { name: "trace-steps", ok: traceSteps.ok && traceSteps.durationMs <= budgets.traceStepsMs, valueMs: traceSteps.durationMs, budgetMs: budgets.traceStepsMs, hard: false },
+    { name: "trace-step-detail", ok: traceStepDetail.ok && traceStepDetail.durationMs <= budgets.traceStepDetailMs, valueMs: traceStepDetail.durationMs, budgetMs: budgets.traceStepDetailMs, hard: false },
+  ];
+  const hardChecks = checks.filter((check) => check.hard);
+  const result = {
+    ok: hardChecks.every((check) => check.ok),
+    measuredAt: new Date().toISOString(),
+    url,
+    taskId,
+    seq,
+    skippedTaskIds: target.skippedTaskIds,
+    selection: target.selection,
+    budgets,
+    checks,
+    diagnostics: {
+      nonBlockingChecks: checks.filter((check) => !check.hard).map((check) => check.name),
+      traceProbe,
+    },
+    health,
+    firstPaint,
+    traceSummary,
+    traceSteps,
+    traceStepDetail,
+    overview: {
+      p50Ms: Math.round(percentile(overviewSuccessful, 50) * 10) / 10,
+      p95Ms: overviewP95Ms,
+      samples: overviewSamples,
+    },
+  };
+  console.log(JSON.stringify(result, null, 2));
+  if (!result.ok) process.exitCode = 1;
+}
+
+await main();
@@ -14,6 +14,7 @@ import { runCodeQueueDeployCompatCommand, runDeployCommand } from "./src/deploy"
 import { runProviderCommand } from "./src/provider-attach";
 import { runScheduleCommand } from "./src/schedules";
 import { parseNetworkPerfOptions, runNetworkPerf } from "./src/network-perf";
+import { runCiCommand } from "./src/ci";

 const remoteOptions = extractRemoteCliOptions(process.argv.slice(2));
 const args = remoteOptions.args;
@@ -45,6 +46,8 @@ function help(): unknown {
      { command: "microservice status <id>", description: "Show one user service config, repository reference, backend mapping, and runtime status." },
      { command: "microservice health <id>", description: "Probe one user service through backend-core -> provider-gateway HTTP proxy." },
      { command: "microservice proxy <id> <path> [--method GET|POST|PUT|PATCH|DELETE] [--body-json JSON|--body-file path|--body-stdin] [--raw] [--max-body-bytes N]", description: "Access a private user-service backend path through the same frontend-only proxy used by WebUI; JSON request bodies are supported for controlled write/debug endpoints." },
+      { command: "microservice diagnostics <id>", description: "Split k3sctl-managed proxy health into provider-gateway, HTTP tunnel, adapter, Kubernetes API service proxy, and target Service checks." },
+      { command: "microservice tunnel-self-test <id>", description: "Trigger an expected provider HTTP tunnel failure and verify requestId/stage diagnostics are returned." },
      { command: "decision upload <markdown-file> [--title text] [--type meeting|decision] [--level G0|G1|G2|G3|P0|P1|P2|P3|none] [--status active|blocked|parked|done] [--linked-goal-id id] [--evidence url]", description: "Upload a meeting note or decision record through backend-core -> decision-center user-service proxy." },
      { command: "decision list [--type ...] [--status ...] [--level ...] [--linked-goal-id id] [--limit N]", description: "List Decision Center records through the user-service proxy." },
      { command: "decision show <id>", description: "Show one Decision Center record." },
@@ -64,6 +67,7 @@ function help(): unknown {
      { command: "debug dispatch [providerId] [docker.ps|provider.upgrade|host.ssh|microservice.http|echo] [--wait-ms N]", description: "Submit a real internal-core dispatch request for CLI debugging." },
      { command: "debug task <taskId|latest>", description: "Read a dispatched task record from internal core for CLI debugging." },
      { command: "network perf [--service code-queue --path /api/tasks/overview?limit=30 --count N --concurrency N --label before|after]", description: "Benchmark frontend -> backend-core -> provider/adapter user-service networking and report latency/proxy-mode distributions." },
+      { command: "ci install|status|run|logs", description: "Manage D601 k3s Tekton CI only; does not deploy CD. CI reads the production PostgreSQL through a temporary read-only Code Queue service." },
      { command: "e2e run [--only pattern[,pattern...]] [--skip pattern[,pattern...]]", description: "Run selected public/internal/Playwright E2E checks; use --only for focused iteration and rerun without filters for final regression." },
    ],
  };
@@ -258,6 +262,11 @@ async function main(): Promise<void> {
    return;
  }

+  if (top === "ci") {
+    emitJson(commandName, runCiCommand(config, args.slice(1)));
+    return;
+  }
+
  if (top === "e2e" && sub === "run") {
    const result = await runE2E(config, parseE2ERunOptions(args.slice(2)));
    const ok = (result as { ok?: unknown }).ok === true;
@@ -0,0 +1,354 @@
+import { spawnSync } from "node:child_process";
+import { existsSync, readFileSync } from "node:fs";
+import { runCommand } from "./command";
+import { type UniDeskConfig, repoRoot, rootPath } from "./config";
+import { startJob } from "./jobs";
+
+const k3sctlContainerName = "k3sctl-adapter";
+const k3sctlSshKey = "/run/host-ssh/id_ed25519";
+const d601SshTarget = "ubuntu@host.docker.internal";
+const d601Kubeconfig = "/etc/rancher/k3s/k3s.yaml";
+const tektonPipelineVersion = "v1.12.0";
+const tektonTriggersVersion = "v0.34.0";
+const tektonPipelineReleaseUrl = `https://infra.tekton.dev/tekton-releases/pipeline/previous/${tektonPipelineVersion}/release.yaml`;
+const tektonTriggersReleaseUrl = `https://infra.tekton.dev/tekton-releases/triggers/previous/${tektonTriggersVersion}/release.yaml`;
+const tektonTriggersInterceptorsUrl = `https://infra.tekton.dev/tekton-releases/triggers/previous/${tektonTriggersVersion}/interceptors.yaml`;
+const providerGatewayWsEgressProxyUrl = "http://127.0.0.1:18789";
+const ciRuntimeImages = [
+  "rancher/mirrored-pause:3.6",
+  "rancher/mirrored-library-busybox:1.36.1",
+  "cgr.dev/chainguard/busybox@sha256:19f02276bf8dbdd62f069b922f10c65262cc34b710eea26ff928129a736be791",
+  "ghcr.io/tektoncd/pipeline/entrypoint-bff0a22da108bc2f16c818c97641a296:v1.12.0",
+  "ghcr.io/tektoncd/pipeline/workingdirinit-0c558922ec6a1b739e550e349f2d5fc1:v1.12.0",
+  "ghcr.io/tektoncd/pipeline/nop-8eac7c133edad5df719dc37b36b62482:v1.12.0",
+  "ghcr.io/tektoncd/pipeline/events-a9042f7efb0cbade2a868a1ee5ddd52c:v1.12.0",
+  "ghcr.io/tektoncd/triggers/eventlistenersink-7ad1faa98cddbcb0c24990303b220bb8:v0.34.0",
+  "oven/bun:1-debian",
+  "alpine/git:2.45.2",
+  "unidesk-code-queue:d601",
+];
+
+interface CiOptions {
+  repoUrl: string;
+  revision: string;
+  waitMs: number;
+}
+
+function stringOption(args: string[], name: string): string | null {
+  const index = args.indexOf(name);
+  if (index === -1) return null;
+  const value = args[index + 1];
+  if (value === undefined || value.startsWith("--")) throw new Error(`${name} requires a value`);
+  return value;
+}
+
+function numberOption(args: string[], name: string, fallback: number): number {
+  const raw = stringOption(args, name);
+  if (raw === null) return fallback;
+  const value = Number(raw);
+  if (!Number.isInteger(value) || value < 0) throw new Error(`${name} must be a non-negative integer`);
+  return value;
+}
+
+function requireRevision(value: string | null): string {
+  if (value === null || value.length === 0) throw new Error("ci run requires --revision <commit-or-ref>");
+  if (!/^[A-Za-z0-9._/@:-]{1,160}$/u.test(value)) throw new Error("ci --revision contains unsupported characters");
+  return value;
+}
+
+function shellQuote(value: string): string {
+  return `'${value.replace(/'/gu, "'\\''")}'`;
+}
+
+function dockerExecK3sctl(args: string[]) {
+  return runCommand(["docker", "exec", k3sctlContainerName, ...args], repoRoot);
+}
+
+function dockerExecK3sctlWithInput(args: string[], input: string) {
+  const command = ["docker", "exec", "-i", k3sctlContainerName, ...args];
+  const result = spawnSync(command[0], command.slice(1), {
+    cwd: repoRoot,
+    encoding: "utf8",
+    input,
+    maxBuffer: 1024 * 1024 * 8,
+  });
+  return {
+    command,
+    cwd: repoRoot,
+    exitCode: result.status,
+    stdout: result.stdout ?? "",
+    stderr: result.stderr ?? result.error?.message ?? "",
+  };
+}
+
+function remoteKubectlCommand(script: string): string[] {
+  return [
+    "sh",
+    "-lc",
+    [
+      "ssh",
+      "-i",
+      shellQuote(k3sctlSshKey),
+      "-o",
+      "StrictHostKeyChecking=no",
+      "-o",
+      "UserKnownHostsFile=/tmp/unidesk-ci-known-hosts",
+      "-o",
+      "ConnectTimeout=10",
+      shellQuote(d601SshTarget),
+      shellQuote(`KUBECONFIG=${d601Kubeconfig} bash -lc ${shellQuote(script)}`),
+    ].join(" "),
+  ];
+}
+
+function runRemoteKubectl(script: string) {
+  const result = dockerExecK3sctl(remoteKubectlCommand(script));
+  if (result.exitCode !== 0) {
+    throw new Error(`D601 kubectl command failed: ${result.stderr || result.stdout}`);
+  }
+  return result;
+}
+
+function remoteApplyManifest(path: string): void {
+  const absolute = rootPath(path);
+  if (!existsSync(absolute)) throw new Error(`manifest not found: ${path}`);
+  const result = dockerExecK3sctlWithInput([
+    "sh",
+    "-lc",
+    [
+      "ssh",
+      "-i",
+      shellQuote(k3sctlSshKey),
+      "-o",
+      "StrictHostKeyChecking=no",
+      "-o",
+      "UserKnownHostsFile=/tmp/unidesk-ci-known-hosts",
+      "-o",
+      "ConnectTimeout=10",
+      shellQuote(d601SshTarget),
+      shellQuote(`KUBECONFIG=${d601Kubeconfig} kubectl apply -f -`),
+    ].join(" "),
+  ], readFileSync(absolute, "utf8"));
+  if (result.exitCode !== 0) {
+    throw new Error(`kubectl apply failed for ${path}: ${result.stderr || result.stdout}`);
+  }
+}
+
+function prewarmCiRuntimeImages(): void {
+  const images = ciRuntimeImages.map(shellQuote).join(" ");
+  runRemoteKubectl([
+    "set -euo pipefail",
+    "export DOCKER_CONFIG=/tmp/unidesk-ci-docker-config",
+    "mkdir -p \"$DOCKER_CONFIG\"",
+    "printf '{}\\n' > \"$DOCKER_CONFIG/config.json\"",
+    `images=(${images})`,
+    "for image in \"${images[@]}\"; do",
+    "  if ! docker image inspect \"$image\" >/dev/null 2>&1; then",
+    "    echo ci_runtime_image_pull=$image",
+    `    HTTP_PROXY=${shellQuote(providerGatewayWsEgressProxyUrl)} HTTPS_PROXY=${shellQuote(providerGatewayWsEgressProxyUrl)} ALL_PROXY=${shellQuote(providerGatewayWsEgressProxyUrl)} NO_PROXY=localhost,127.0.0.1,::1,host.docker.internal docker pull --platform linux/amd64 "$image"`,
+    "  else",
+    "    echo ci_runtime_image_cached=$image",
+    "  fi",
+    "done",
+    "pause_entrypoint=$(docker image inspect rancher/mirrored-pause:3.6 --format '{{json .Config.Entrypoint}}' 2>/dev/null || true)",
+    "if ! printf '%s' \"$pause_entrypoint\" | grep -q '\"/pause\"'; then echo native_k3s_pause_image_invalid_entrypoint=$pause_entrypoint >&2; exit 1; fi",
+    "rm -f /tmp/unidesk-ci-runtime-images.tar",
+    "docker save \"${images[@]}\" -o /tmp/unidesk-ci-runtime-images.tar",
+    "/mnt/c/Windows/System32/wsl.exe -u root -- ctr --address /run/k3s/containerd/containerd.sock -n k8s.io images import --digests --all-platforms /tmp/unidesk-ci-runtime-images.tar >/tmp/unidesk-ci-runtime-images-import.log",
+    "/mnt/c/Windows/System32/wsl.exe -u root -- ctr --address /run/k3s/containerd/containerd.sock -n k8s.io images ls | grep -F 'docker.io/rancher/mirrored-pause:3.6' >/dev/null",
+    "/mnt/c/Windows/System32/wsl.exe -u root -- ctr --address /run/k3s/containerd/containerd.sock -n k8s.io images ls | grep -F 'docker.io/oven/bun:1-debian' >/dev/null",
+    "/mnt/c/Windows/System32/wsl.exe -u root -- ctr --address /run/k3s/containerd/containerd.sock -n k8s.io images ls | grep -F 'docker.io/alpine/git:2.45.2' >/dev/null",
+    "/mnt/c/Windows/System32/wsl.exe -u root -- ctr --address /run/k3s/containerd/containerd.sock -n k8s.io images ls | grep -F 'docker.io/library/unidesk-code-queue:d601' >/dev/null",
+  ].join("\n"));
+}
+
+function status(): Record<string, unknown> {
+  const summary = runRemoteKubectl([
+    "set -euo pipefail",
+    "printf 'tekton_pipelines='",
+    "kubectl get deploy -n tekton-pipelines -o name 2>/dev/null | tr '\\n' ' ' || true",
+    "printf '\\ntekton_triggers='",
+    "kubectl get deploy -n tekton-pipelines-resolvers -o name 2>/dev/null | tr '\\n' ' ' || true",
+    "printf '\\nunidesk_ci='",
+    "kubectl get pipeline,task,pipelinerun,eventlistener,svc -n unidesk-ci -o name 2>/dev/null | tr '\\n' ' ' || true",
+    "printf '\\n'",
+  ].join("\n"));
+  return {
+    ok: true,
+    providerId: "D601",
+    orchestrator: "native-k3s",
+    tekton: {
+      pipelineVersion: tektonPipelineVersion,
+      triggersVersion: tektonTriggersVersion,
+    },
+    summary: summary.stdout.trim(),
+  };
+}
+
+function install(): Record<string, unknown> {
+  if (!existsSync(rootPath("src/components/microservices/k3sctl-adapter/k3s/ci/unidesk-ci.pipeline.yaml"))) {
+    throw new Error("CI manifests are missing");
+  }
+  prewarmCiRuntimeImages();
+  runRemoteKubectl([
+    "set -euo pipefail",
+    `kubectl apply -f ${shellQuote(tektonPipelineReleaseUrl)}`,
+    "kubectl wait --for=condition=Available deployment --all -n tekton-pipelines --timeout=900s",
+    `kubectl apply -f ${shellQuote(tektonTriggersReleaseUrl)}`,
+    `kubectl apply -f ${shellQuote(tektonTriggersInterceptorsUrl)}`,
+    "kubectl wait --for=condition=Available deployment --all -n tekton-pipelines --timeout=900s",
+    "kubectl wait --for=condition=Available deployment --all -n tekton-pipelines-resolvers --timeout=900s",
+  ].join("\n"));
+  remoteApplyManifest("src/components/microservices/k3sctl-adapter/k3s/ci/tekton-install.yaml");
+  remoteApplyManifest("src/components/microservices/k3sctl-adapter/k3s/ci/unidesk-ci.pipeline.yaml");
+  remoteApplyManifest("src/components/microservices/k3sctl-adapter/k3s/ci/unidesk-ci.triggers.yaml");
+  return status();
+}
+
+function pipelineRunManifest(options: CiOptions): string {
+  const safeSuffix = new Date().toISOString().replace(/[-:.TZ]/g, "").slice(0, 14).toLowerCase();
+  return `apiVersion: tekton.dev/v1
+kind: PipelineRun
+metadata:
+  generateName: unidesk-ci-${safeSuffix}-
+  namespace: unidesk-ci
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/part-of: unidesk
+    unidesk.ai/revision: ${JSON.stringify(options.revision)}
+spec:
+  pipelineRef:
+    name: unidesk-ci
+  taskRunTemplate:
+    serviceAccountName: unidesk-ci-runner
+  params:
+    - name: repo-url
+      value: ${JSON.stringify(options.repoUrl)}
+    - name: revision
+      value: ${JSON.stringify(options.revision)}
+  workspaces:
+    - name: shared-workspace
+      persistentVolumeClaim:
+        claimName: unidesk-ci-cache
+`;
+}
+
+function remoteCreatePipelineRun(manifest: string): string {
+  const result = dockerExecK3sctlWithInput([
+    "sh",
+    "-lc",
+    [
+      "ssh",
+      "-i",
+      shellQuote(k3sctlSshKey),
+      "-o",
+      "StrictHostKeyChecking=no",
+      "-o",
+      "UserKnownHostsFile=/tmp/unidesk-ci-known-hosts",
+      shellQuote(d601SshTarget),
+      shellQuote(`KUBECONFIG=${d601Kubeconfig} kubectl create -f - -o jsonpath='{.metadata.name}'`),
+    ].join(" "),
+  ], manifest);
+  if (result.exitCode !== 0) throw new Error(result.stderr || result.stdout);
+  return result.stdout.trim();
+}
+
+function run(options: CiOptions): Record<string, unknown> {
+  const name = remoteCreatePipelineRun(pipelineRunManifest(options));
+  const wait = options.waitMs > 0 ? dockerExecK3sctl(remoteKubectlCommand([
+    "set -euo pipefail",
+    `deadline=$((SECONDS + ${Math.ceil(options.waitMs / 1000)}))`,
+    "while [ \"$SECONDS\" -lt \"$deadline\" ]; do",
+    `  condition="$(kubectl get pipelinerun/${shellQuote(name)} -n unidesk-ci -o jsonpath='{range .status.conditions[?(@.type==\"Succeeded\")]}{.status}{\"\\t\"}{.reason}{\"\\t\"}{.message}{end}' 2>/dev/null || true)"`,
+    "  case \"$condition\" in",
+    "    True*)",
+    "      echo \"$condition\"",
+    `      kubectl get pipelinerun/${shellQuote(name)} -n unidesk-ci -o json`,
+    "      exit 0",
+    "      ;;",
+    "    False*)",
+    "      echo \"$condition\"",
+    `      kubectl get pipelinerun/${shellQuote(name)} -n unidesk-ci -o json`,
+    "      exit 1",
+    "      ;;",
+    "  esac",
+    "  sleep 2",
+    "done",
+    `echo "Timed out waiting for pipelinerun/${name}" >&2`,
+    `kubectl get pipelinerun/${shellQuote(name)} -n unidesk-ci -o json`,
+    "exit 124",
+  ].join("\n"))) : null;
+  const waitSucceeded = wait === null || wait.exitCode === 0 || wait.stdout.trimStart().startsWith("True\tSucceeded\t");
+  return {
+    ok: waitSucceeded,
+    pipelineRun: name,
+    namespace: "unidesk-ci",
+    repoUrl: options.repoUrl,
+    revision: options.revision,
+    wait: wait === null ? null : {
+      stdoutTail: wait.stdout.slice(-6000),
+      stderrTail: wait.stderr.slice(-6000),
+    },
+    next: [
+      `bun scripts/cli.ts ci logs ${name}`,
+      "bun scripts/cli.ts ci status",
+    ],
+  };
+}
+
+function logs(name: string): Record<string, unknown> {
+  if (name.length === 0) throw new Error("ci logs requires PipelineRun name");
+  const result = runRemoteKubectl([
+    "set -euo pipefail",
+    `kubectl get pipelinerun/${shellQuote(name)} -n unidesk-ci -o wide`,
+    `kubectl get taskrun -n unidesk-ci -l tekton.dev/pipelineRun=${shellQuote(name)} -o wide`,
+    `for pod in $(kubectl get pods -n unidesk-ci -l tekton.dev/pipelineRun=${shellQuote(name)} -o name); do echo "===== $pod"; kubectl logs -n unidesk-ci "$pod" --all-containers=true --tail=160; done`,
+  ].join("\n"));
+  return {
+    ok: true,
+    pipelineRun: name,
+    output: result.stdout,
+    stderr: result.stderr,
+  };
+}
+
+function help(): Record<string, unknown> {
+  return {
+    command: "ci install|status|run|logs",
+    description: "Manage the D601 k3s Tekton CI gate. This intentionally does not deploy CD.",
+    examples: [
+      "bun scripts/cli.ts ci install",
+      "bun scripts/cli.ts ci run --revision <commit>",
+      "bun scripts/cli.ts ci logs <pipelineRun>",
+    ],
+    tekton: {
+      pipelineVersion: tektonPipelineVersion,
+      triggersVersion: tektonTriggersVersion,
+      sources: {
+        pipeline: tektonPipelineReleaseUrl,
+        triggers: tektonTriggersReleaseUrl,
+        interceptors: tektonTriggersInterceptorsUrl,
+      },
+    },
+  };
+}
+
+export function runCiCommand(_config: UniDeskConfig, args: string[]): Record<string, unknown> {
+  const [action = "status", nameArg] = args;
+  if (action === "help" || action === "--help" || action === "-h") return help();
+  if (action === "install") return install();
+  if (action === "status") return status();
+  if (action === "run") {
+    const repoUrl = stringOption(args, "--repo") ?? stringOption(args, "--repo-url") ?? "https://github.com/pikasTech/unidesk";
+    const revision = requireRevision(stringOption(args, "--revision") ?? stringOption(args, "--commit"));
+    const waitMs = numberOption(args, "--wait-ms", 0);
+    return run({ repoUrl, revision, waitMs });
+  }
+  if (action === "logs") return logs(nameArg ?? "");
+  throw new Error("ci command must be one of: install, status, run, logs");
+}
+
+export function startCiInstallJob(): Record<string, unknown> {
+  const job = startJob("ci_install", ["bun", "scripts/cli.ts", "ci", "install"], "Install/refresh Tekton CI on D601 k3s");
+  return { ok: true, job };
+}
@@ -125,11 +125,19 @@ export async function runMicroserviceCommand(_config: UniDeskConfig, args: strin
    const id = requireId(idArg, "microservice health");
    return coreInternalFetch(`/api/microservices/${encodeId(id)}/health`);
  }
+  if (action === "diagnostics") {
+    const id = requireId(idArg, "microservice diagnostics");
+    return coreInternalFetch(`/api/microservices/${encodeId(id)}/diagnostics`);
+  }
+  if (action === "tunnel-self-test") {
+    const id = requireId(idArg, "microservice tunnel-self-test");
+    return coreInternalFetch(`/api/microservices/${encodeId(id)}/tunnel-self-test`);
+  }
  if (action === "proxy") {
    const id = requireId(idArg, "microservice proxy");
    const path = requireProxyPath(pathArg);
    const body = requestBodyOption(args);
    return summarizeMicroserviceProxyResponse(coreInternalFetch(`/api/microservices/${encodeId(id)}/proxy${path}`, { method: methodOption(args, body !== undefined), body }), args);
  }
-  throw new Error("microservice command must be one of: list, status, health, proxy");
+  throw new Error("microservice command must be one of: list, status, health, diagnostics, tunnel-self-test, proxy");
 }
@@ -455,7 +455,7 @@ async function remoteMicroservice(session: FrontendSession, args: string[]): Pro
  if (action === "list") {
    return { transport: "frontend", response: await frontendJson(session, "/api/microservices", undefined, 12_000) };
  }
-  if ((action === "status" || action === "health") && id !== undefined) {
+  if ((action === "status" || action === "health" || action === "diagnostics" || action === "tunnel-self-test") && id !== undefined) {
    return {
      transport: "frontend",
      response: await frontendJson(session, `/api/microservices/${encodeURIComponent(id)}/${action}`, undefined, 18_000),
@@ -468,7 +468,7 @@ async function remoteMicroservice(session: FrontendSession, args: string[]): Pro
      response: summarizeMicroserviceProxyResponse(response, args),
    };
  }
-  throw new Error("remote microservice command must be: microservice list | status <id> | health <id> | proxy <id> <path>");
+  throw new Error("remote microservice command must be: microservice list | status <id> | health <id> | diagnostics <id> | tunnel-self-test <id> | proxy <id> <path>");
 }

 async function remoteCodeQueue(session: FrontendSession, args: string[]): Promise<unknown> {
@@ -559,7 +559,7 @@ async function runRemoteCliOverFrontend(options: RemoteCliOptions, config: UniDe
      emitRemoteJson(name, {
        transport: "frontend",
        baseUrl: session.baseUrl,
-        commands: ["debug health", "debug dispatch", "debug task", "ssh <providerId> <command>", "ssh <providerId> skills", "microservice list", "microservice status <id>", "microservice health <id>", "microservice proxy <id> <path>", "decision upload <markdown-file>", "decision list", "decision show <id>", "codex task <taskId>", "codex judge <taskId> --attempt N", "network perf"],
+        commands: ["debug health", "debug dispatch", "debug task", "ssh <providerId> <command>", "ssh <providerId> skills", "microservice list", "microservice status <id>", "microservice health <id>", "microservice diagnostics <id>", "microservice tunnel-self-test <id>", "microservice proxy <id> <path>", "decision upload <markdown-file>", "decision list", "decision show <id>", "codex task <taskId>", "codex judge <taskId> --attempt N", "network perf"],
      });
      return 0;
    }
@@ -9,5 +9,5 @@
    "noFallthroughCasesInSwitch": true,
    "skipLibCheck": true
  },
-  "include": ["cli.ts", "src/**/*.ts"]
+  "include": ["cli.ts", "src/**/*.ts", "../scripts/*.ts"]
 }
@@ -93,3 +93,11 @@ export function closeEgressTcpConnectionsForProvider(providerId: string): void {
    connection.socket.destroy();
  }
 }
+
+export function closeEgressTcpConnectionsForSocket(provider: ProviderSocket): void {
+  for (const [key, connection] of ctx.activeEgressTcpConnections) {
+    if (connection.provider !== provider) continue;
+    ctx.activeEgressTcpConnections.delete(key);
+    connection.socket.destroy();
+  }
+}
@@ -9,7 +9,7 @@ import { recordRequestPerformance, withPerformanceOperation, getPerformance } fr
 import { handleProviderMessage, markProviderOffline, markStaleProvidersOffline } from "./provider-registry";
 import { markStaleTasksFailed, dispatchTask } from "./task-dispatcher";
 import { handleSshClientMessage, sshRoute } from "./ssh-bridge";
-import { closeEgressTcpConnectionsForProvider } from "./egress-tcp";
+import { closeEgressTcpConnectionsForProvider, closeEgressTcpConnectionsForSocket } from "./egress-tcp";
 import { scheduledTaskRoute, runDueScheduledTasks, recoverScheduledRuns } from "./scheduler";
 import { microserviceRoute, getMicroservices } from "./microservice-proxy";
 import { getOverview, codexQueueLoadTest } from "./overview";
@@ -171,17 +171,18 @@ const providerServer = Bun.serve<WsData>({
      const providerId = ws.data.providerId;
      logger("warn", "provider_socket_close", { providerId: providerId ?? null });
      if (providerId !== undefined) {
+        if (ctx.activeProviders.get(providerId) !== ws) {
+          closeEgressTcpConnectionsForSocket(ws);
+          logger("info", "provider_socket_close_ignored_replaced", { providerId });
+          return;
+        }
        closeEgressTcpConnectionsForProvider(providerId);
        for (const [requestId, waiter] of ctx.httpTunnelWaiters) {
          if (requestId.startsWith(`${providerId}:`)) {
            ctx.httpTunnelWaiters.delete(requestId);
-            waiter(null);
+            waiter(null, "provider-disconnected");
          }
        }
-        if (ctx.activeProviders.get(providerId) !== ws) {
-          logger("info", "provider_socket_close_ignored_replaced", { providerId });
-          return;
-        }
        markProviderOffline(providerId).catch((error) => logger("error", "provider_offline_mark_failed", { providerId, error: errorToJson(error) }));
      }
    },
@@ -1,6 +1,6 @@
 import type { JsonValue } from "../../shared/src/index";
 import { ctx, config, logger } from "./context";
-import type { MicroserviceConfig, MicroserviceProxyCacheEntry, MicroserviceHealthAssessment, MicroserviceAvailabilityEntry, RawTaskRow } from "./types";
+import type { HttpTunnelFailureReason, MicroserviceConfig, MicroserviceProxyCacheEntry, MicroserviceHealthAssessment, MicroserviceAvailabilityEntry, RawTaskRow } from "./types";
 import { jsonResponse, errorToJson, compactJson, isPlainRecord, truncateText } from "./http";
 import { createAndSendTask, waitForTaskTerminal, providerSupports } from "./task-dispatcher";
 import { getNodes, getNodeDockerStatuses } from "./db";
@@ -12,6 +12,7 @@ import { getNodes, getNodeDockerStatuses } from "./db";
 const microserviceProxyMaxBodyTextLength = 8 * 1024 * 1024;
 const microserviceAvailabilityTtlMs = 30_000;
 const codeQueueOverviewPathFallbackStaleMs = 30_000;
+const providerHttpTunnelMaxAttempts = 3;
 const microserviceForwardRequestHeaders = [
  "accept",
  "content-type",
@@ -468,6 +469,13 @@ function responseFromMicroserviceCache(entry: MicroserviceProxyCacheEntry, state
  });
 }

+function isMicroserviceTransientFailureResponse(response: Response): boolean {
+  if (response.status !== 502 && response.status !== 503 && response.status !== 504) return false;
+  return response.headers.get("x-unidesk-transient-error") === "true"
+    || response.headers.get("x-unidesk-tunnel-error") !== null
+    || response.headers.get("x-unidesk-upstream-proxy-mode") === "provider-gateway-http-fetch";
+}
+
 function readMicroserviceCache(key: string): Response | null {
  const entry = ctx.microserviceProxyCache.get(key);
  if (entry === undefined) return null;
@@ -638,43 +646,248 @@ async function k3sctlAdapterMicroserviceResponse(
  return fetchMicroserviceUpstreamResponse(adapter, method, adapterTargetPath, proxyOptions, requestHeaders, bodyText, abortSignal);
 }

+async function k3sctlManagedDiagnosticsResponse(service: MicroserviceConfig): Promise<Response> {
+  const adapterServiceId = service.deployment.adapterServiceId ?? "k3sctl-adapter";
+  const adapter = microserviceById(adapterServiceId);
+  const checkedAt = new Date().toISOString();
+  const providerId = adapter?.providerId ?? service.providerId;
+  const providerOnline = ctx.activeProviders.has(providerId);
+  const providerTunnelCapable = await providerSupports(providerId, "microservice.http.tunnel");
+  if (adapter === null) {
+    return jsonResponse({
+      ok: false,
+      serviceId: service.id,
+      checkedAt,
+      requestPath: "/diagnostics",
+      checks: {
+        providerGateway: { ok: providerOnline, providerId, online: providerOnline },
+        httpTunnel: { ok: providerTunnelCapable, providerId, capable: providerTunnelCapable },
+        k3sctlAdapter: { ok: false, serviceId: adapterServiceId, error: `k3sctl adapter microservice not found: ${adapterServiceId}` },
+        kubernetesApiServiceProxy: { ok: false, skipped: true },
+        targetService: { ok: false, skipped: true },
+      },
+    }, 502);
+  }
+
+  const k3sServiceId = service.id === "code-queue"
+    ? codeQueueK3sServiceIdForRequest("GET", service.backend.healthPath)
+    : service.deployment.k3sServiceId ?? service.id;
+  const adapterPath = `/api/services/${encodeURIComponent(k3sServiceId)}/diagnostics`;
+  const response = await fetchMicroserviceUpstreamResponse(
+    adapter,
+    "GET",
+    adapterPath,
+    { query: "", jsonArrayLimits: {} },
+    { accept: "application/json" },
+    "",
+  );
+  const contentType = response.headers.get("content-type") ?? "application/json; charset=utf-8";
+  const bodyText = await response.text();
+  let adapterBody: JsonValue = bodyText;
+  try {
+    adapterBody = JSON.parse(bodyText) as JsonValue;
+  } catch {
+    adapterBody = bodyText.slice(0, 4000);
+  }
+  const bodyRecord = isPlainRecord(adapterBody) ? adapterBody : {};
+  const adapterChecks = isPlainRecord(bodyRecord.checks) ? bodyRecord.checks : {};
+  const checks = {
+    providerGateway: {
+      ok: providerOnline,
+      providerId,
+      online: providerOnline,
+      activeSocketCount: ctx.activeProviders.size,
+    },
+    httpTunnel: {
+      ok: response.ok && response.headers.get("x-unidesk-proxy-mode") === "provider-ws-http-tunnel",
+      providerId,
+      capable: providerTunnelCapable,
+      requestId: response.headers.get("x-unidesk-request-id") ?? null,
+      attempts: response.headers.get("x-unidesk-http-tunnel-attempts") ?? null,
+      upstreamProxyMode: response.headers.get("x-unidesk-upstream-proxy-mode") ?? null,
+      proxyStatus: response.status,
+    },
+    k3sctlAdapter: {
+      ok: response.ok,
+      serviceId: adapter.id,
+      providerId: adapter.providerId,
+      status: response.status,
+      contentType,
+    },
+    kubernetesApiServiceProxy: compactJson(adapterChecks.kubernetesApiServiceProxy ?? { ok: false, skipped: true }),
+    targetService: compactJson(adapterChecks.targetService ?? adapterChecks.managedService ?? { ok: false, skipped: true }),
+  } satisfies Record<string, JsonValue>;
+  const httpTunnelCheck = checks.httpTunnel as Record<string, JsonValue>;
+  return jsonResponse({
+    ok: response.ok && providerOnline && httpTunnelCheck.ok === true,
+    serviceId: service.id,
+    k3sServiceId,
+    checkedAt,
+    path: service.backend.healthPath,
+    chain: "CLI/frontend -> backend-core -> provider-gateway HTTP tunnel -> k3sctl-adapter -> Kubernetes API service proxy -> k3s Service",
+    checks,
+    adapter: adapterBody,
+  }, response.ok ? 200 : response.status);
+}
+
+async function microserviceTunnelSelfTestResponse(service: MicroserviceConfig): Promise<Response> {
+  const tunnelService = isK3sctlManagedMicroservice(service)
+    ? microserviceById(service.deployment.adapterServiceId ?? "k3sctl-adapter")
+    : service;
+  if (tunnelService === null) {
+    return jsonResponse({ ok: false, serviceId: service.id, error: "tunnel service not found", adapterServiceId: service.deployment.adapterServiceId ?? null }, 502);
+  }
+  if (!(await providerSupports(tunnelService.providerId, "microservice.http.tunnel"))) {
+    return jsonResponse({
+      ok: false,
+      serviceId: service.id,
+      providerId: tunnelService.providerId,
+      error: `provider does not declare microservice.http.tunnel capability: ${tunnelService.providerId}`,
+    }, 409);
+  }
+  const probeService = {
+    ...tunnelService,
+    backend: {
+      ...tunnelService.backend,
+      nodeBaseUrl: "http://127.0.0.1:1",
+      timeoutMs: 1000,
+    },
+  };
+  const response = await providerHttpTunnelMicroserviceResponse(
+    probeService,
+    "GET",
+    "/",
+    { query: "", jsonArrayLimits: {} },
+    { accept: "application/json" },
+    "",
+  );
+  const headers = {
+    requestId: response.headers.get("x-unidesk-request-id"),
+    tunnelError: response.headers.get("x-unidesk-tunnel-error"),
+    providerId: response.headers.get("x-unidesk-provider-id"),
+    serviceId: response.headers.get("x-unidesk-service-id"),
+    transient: response.headers.get("x-unidesk-transient-error"),
+  };
+  const bodyText = await response.text();
+  let body: JsonValue = bodyText.slice(0, 4000);
+  try {
+    body = JSON.parse(bodyText) as JsonValue;
+  } catch {
+    // Keep bounded text body for malformed JSON diagnostics.
+  }
+  const bodyRecord = isPlainRecord(body) ? body : {};
+  const hasRequestId = typeof bodyRecord.requestId === "string" && bodyRecord.requestId.length > 0;
+  const hasStage = typeof bodyRecord.stage === "string" && bodyRecord.stage.length > 0;
+  const ok = response.status === 502 && hasRequestId && hasStage && headers.requestId === bodyRecord.requestId;
+  return jsonResponse({
+    ok,
+    serviceId: service.id,
+    tunnelServiceId: tunnelService.id,
+    providerId: tunnelService.providerId,
+    expectedFailure: true,
+    status: response.status,
+    checks: {
+      expectedStatus: response.status === 502,
+      bodyHasRequestId: hasRequestId,
+      bodyHasStage: hasStage,
+      headerHasRequestId: typeof headers.requestId === "string" && headers.requestId.length > 0,
+      headerHasTunnelError: typeof headers.tunnelError === "string" && headers.tunnelError.length > 0,
+    },
+    headers,
+    body,
+  }, ok ? 200 : 502);
+}
+
 function providerHttpTunnelRequestId(providerId: string): string {
  return `${providerId}:http_${Date.now()}_${Math.random().toString(16).slice(2)}`;
 }

+function canRetryProviderHttpTunnel(method: string, targetPath: string): boolean {
+  const normalizedMethod = method.toUpperCase();
+  if (normalizedMethod !== "GET" && normalizedMethod !== "HEAD") return false;
+  return !targetPath.endsWith("/stream");
+}
+
+function providerHttpTunnelWaitMs(service: MicroserviceConfig, attempt: number, retryable: boolean): number {
+  const baseTimeoutMs = Math.max(1000, service.backend.timeoutMs);
+  if (!retryable) return baseTimeoutMs + 3000;
+  if (attempt === 1) return Math.min(baseTimeoutMs + 3000, Math.max(5000, Math.floor(baseTimeoutMs * 0.45)));
+  if (attempt === 2) return Math.min(baseTimeoutMs + 3000, Math.max(6000, Math.floor(baseTimeoutMs * 0.7)));
+  return baseTimeoutMs + 3000;
+}
+
+function tunnelErrorBody(
+  service: MicroserviceConfig,
+  requestId: string,
+  error: string,
+  stage: string,
+  status: number,
+  extra: Record<string, JsonValue> = {},
+): Response {
+  const response = jsonResponse({
+    ok: false,
+    error,
+    stage,
+    providerId: service.providerId,
+    serviceId: service.id,
+    requestId,
+    ...extra,
+  }, status);
+  response.headers.set("x-unidesk-request-id", requestId);
+  response.headers.set("x-unidesk-provider-id", service.providerId);
+  response.headers.set("x-unidesk-service-id", service.id);
+  response.headers.set("x-unidesk-tunnel-error", stage);
+  if (status === 502 || status === 503 || status === 504) response.headers.set("x-unidesk-transient-error", "true");
+  return response;
+}
+
+function providerHttpTunnelFailureStatus(reason: HttpTunnelFailureReason | null): number {
+  if (reason === "aborted") return 499;
+  if (reason === "provider-disconnected") return 503;
+  if (reason === "send-failed") return 502;
+  return 504;
+}
+
+function tunnelFailureRetryable(reason: HttpTunnelFailureReason | null): boolean {
+  return reason === "timeout" || reason === "provider-disconnected" || reason === "send-failed";
+}
+
 async function waitForProviderHttpTunnelResponse(
  providerId: string,
  requestId: string,
  timeoutMs: number,
  abortSignal?: AbortSignal,
-): Promise<{ providerId: string; requestId: string; ok: boolean; result: JsonValue } | null> {
+): Promise<{ message: { providerId: string; requestId: string; ok: boolean; result: JsonValue } | null; reason: HttpTunnelFailureReason | null }> {
  return await new Promise((resolve) => {
    let settled = false;
    let abortHandler: (() => void) | null = null;
-    const timer = setTimeout(() => settle(null), Math.max(1, timeoutMs));
-    const settle = (message: { providerId: string; requestId: string; ok: boolean; result: JsonValue } | null): void => {
+    const timer = setTimeout(() => settle(null, "timeout"), Math.max(1, timeoutMs));
+    const settle = (
+      message: { providerId: string; requestId: string; ok: boolean; result: JsonValue } | null,
+      reason: HttpTunnelFailureReason | null = null,
+    ): void => {
      if (settled) return;
      settled = true;
      clearTimeout(timer);
      if (abortHandler !== null) abortSignal?.removeEventListener("abort", abortHandler);
      ctx.httpTunnelWaiters.delete(requestId);
-      resolve(message);
+      resolve({ message, reason });
    };
-    abortHandler = () => settle(null);
+    abortHandler = () => settle(null, "aborted");
    if (abortSignal !== undefined) {
      if (abortSignal.aborted) {
-        settle(null);
+        settle(null, "aborted");
        return;
      }
      abortSignal.addEventListener("abort", abortHandler, { once: true });
    }
-    ctx.httpTunnelWaiters.set(requestId, (message) => {
+    ctx.httpTunnelWaiters.set(requestId, (message, reason) => {
      if (message !== null && message.providerId !== providerId) {
        logger("warn", "http_tunnel_provider_mismatch", { requestId, expectedProviderId: providerId, actualProviderId: message.providerId });
-        settle(null);
+        settle(null, "provider-mismatch");
        return;
      }
-      settle(message);
+      settle(message, reason ?? null);
    });
  });
 }
@@ -688,32 +901,116 @@ async function providerHttpTunnelMicroserviceResponse(
  bodyText: string,
  abortSignal?: AbortSignal,
 ): Promise<Response> {
-  const socket = ctx.activeProviders.get(service.providerId);
-  if (socket === undefined) return jsonResponse({ ok: false, error: `provider is offline: ${service.providerId}` }, 503);
-  const requestId = providerHttpTunnelRequestId(service.providerId);
-  const timeoutMs = service.backend.timeoutMs + 3000;
-  const waiter = waitForProviderHttpTunnelResponse(service.providerId, requestId, timeoutMs, abortSignal);
-  socket.send(JSON.stringify({
-    type: "http_tunnel_request",
-    requestId,
-    payload: {
-      source: "microservice-frontend-proxy",
-      serviceId: service.id,
-      method,
-      targetBaseUrl: service.backend.nodeBaseUrl,
-      path: targetPath,
-      query: proxyOptions.query,
-      requestHeaders,
-      bodyText,
-      jsonArrayLimits: proxyOptions.jsonArrayLimits,
-      timeoutMs: service.backend.timeoutMs,
-      cacheTtlMs: providerMicroserviceCacheTtlMs(service.id, targetPath),
-    },
-  }));
-  const message = await waiter;
-  if (message === null) return jsonResponse({ ok: false, error: "provider HTTP tunnel timed out or disconnected", providerId: service.providerId, requestId }, 504);
-  if (!message.ok) return jsonResponse({ ok: false, error: "provider HTTP tunnel failed", providerId: service.providerId, requestId, result: message.result }, 502);
-  return responseFromProviderMicroserviceResult(dockerStatusRecord(message.result), "provider-ws-http-tunnel");
+  const retryable = canRetryProviderHttpTunnel(method, targetPath);
+  const maxAttempts = retryable ? providerHttpTunnelMaxAttempts : 1;
+  const attempts: JsonValue[] = [];
+  let lastRequestId = "";
+  for (let attempt = 1; attempt <= maxAttempts; attempt += 1) {
+    const socket = ctx.activeProviders.get(service.providerId);
+    const requestId = providerHttpTunnelRequestId(service.providerId);
+    lastRequestId = requestId;
+    if (socket === undefined) {
+      attempts.push({ attempt, requestId, ok: false, reason: "provider-offline" });
+      return tunnelErrorBody(service, requestId, `provider is offline: ${service.providerId}`, "provider-gateway-online", 503, {
+        retryable,
+        attempts,
+      });
+    }
+    const timeoutMs = providerHttpTunnelWaitMs(service, attempt, retryable);
+    const startedAt = Date.now();
+    const waiter = waitForProviderHttpTunnelResponse(service.providerId, requestId, timeoutMs, abortSignal);
+    try {
+      socket.send(JSON.stringify({
+        type: "http_tunnel_request",
+        requestId,
+        payload: {
+          source: "microservice-frontend-proxy",
+          serviceId: service.id,
+          method,
+          targetBaseUrl: service.backend.nodeBaseUrl,
+          path: targetPath,
+          query: proxyOptions.query,
+          requestHeaders,
+          bodyText,
+          jsonArrayLimits: proxyOptions.jsonArrayLimits,
+          timeoutMs: service.backend.timeoutMs,
+          cacheTtlMs: providerMicroserviceCacheTtlMs(service.id, targetPath),
+        },
+      }));
+    } catch (error) {
+      ctx.httpTunnelWaiters.get(requestId)?.(null, "send-failed");
+      const durationMs = Date.now() - startedAt;
+      attempts.push({ attempt, requestId, ok: false, reason: "send-failed", durationMs, error: errorToJson(error) });
+      if (attempt < maxAttempts) {
+        logger("warn", "http_tunnel_send_retry", { providerId: service.providerId, serviceId: service.id, requestId, attempt, maxAttempts, error: errorToJson(error) });
+        await Bun.sleep(Math.min(500, 75 * attempt));
+        continue;
+      }
+      return tunnelErrorBody(service, requestId, "provider HTTP tunnel send failed", "http-tunnel-send", 502, {
+        retryable,
+        attempts,
+        detail: errorToJson(error),
+      });
+    }
+    const { message, reason } = await waiter;
+    const durationMs = Date.now() - startedAt;
+    if (message === null) {
+      attempts.push({ attempt, requestId, ok: false, reason: reason ?? "timeout", durationMs, timeoutMs });
+      if (retryable && tunnelFailureRetryable(reason) && attempt < maxAttempts) {
+        logger("warn", "http_tunnel_retry", {
+          providerId: service.providerId,
+          serviceId: service.id,
+          requestId,
+          attempt,
+          maxAttempts,
+          reason: reason ?? "timeout",
+          durationMs,
+          timeoutMs,
+        });
+        await Bun.sleep(Math.min(750, 100 * attempt));
+        continue;
+      }
+      return tunnelErrorBody(
+        service,
+        requestId,
+        "provider HTTP tunnel timed out or disconnected",
+        reason === "provider-disconnected" ? "http-tunnel-provider-disconnected" : reason === "aborted" ? "client-aborted" : "http-tunnel-wait",
+        providerHttpTunnelFailureStatus(reason),
+        { retryable, attempts, timeoutMs, failureReason: reason ?? "timeout" },
+      );
+    }
+    attempts.push({ attempt, requestId, ok: message.ok, durationMs });
+    if (!message.ok) {
+      const result = dockerStatusRecord(message.result);
+      const resultError = typeof result.error === "string" ? result.error : "provider HTTP tunnel failed";
+      logger("warn", "http_tunnel_provider_error", {
+        providerId: service.providerId,
+        serviceId: service.id,
+        requestId,
+        attempt,
+        maxAttempts,
+        durationMs,
+        result: compactJson(result),
+      });
+      if (retryable && attempt < maxAttempts) {
+        await Bun.sleep(Math.min(750, 100 * attempt));
+        continue;
+      }
+      return tunnelErrorBody(service, requestId, "provider HTTP tunnel failed", "provider-gateway-http-fetch", 502, {
+        retryable,
+        attempts,
+        result: message.result,
+        providerError: resultError,
+      });
+    }
+    const response = responseFromProviderMicroserviceResult(dockerStatusRecord(message.result), "provider-ws-http-tunnel");
+    response.headers.set("x-unidesk-request-id", requestId);
+    response.headers.set("x-unidesk-http-tunnel-attempt", String(attempt));
+    response.headers.set("x-unidesk-http-tunnel-attempts", String(attempts.length));
+    response.headers.set("x-unidesk-provider-id", service.providerId);
+    return response;
+  }
+  return tunnelErrorBody(service, lastRequestId, "provider HTTP tunnel exhausted attempts", "http-tunnel-wait", 504, { retryable, attempts });
 }

 async function fetchMicroserviceUpstreamResponse(
@@ -918,14 +1215,29 @@ export async function microserviceRoute(req: Request, url: URL): Promise<Respons
      ? "/"
      : suffix.startsWith(`${proxyPrefix}/`)
        ? `/${suffix.slice(proxyPrefix.length + 1)}`
+        : suffix === "diagnostics"
+          ? "/diagnostics"
+          : suffix === "tunnel-self-test"
+            ? "/tunnel-self-test"
        : "";
-  if (targetPath.length === 0) return jsonResponse({ ok: false, error: "microservice route must be /status, /health, or /proxy/<path>" }, 404);
+  if (targetPath.length === 0) return jsonResponse({ ok: false, error: "microservice route must be /status, /health, /diagnostics, /tunnel-self-test, or /proxy/<path>" }, 404);
  if (suffix === "health" && method !== "GET" && method !== "HEAD") {
    return jsonResponse({ ok: false, error: "microservice health only supports GET/HEAD" }, 405);
  }
+  if (suffix === "diagnostics" && method !== "GET" && method !== "HEAD") {
+    return jsonResponse({ ok: false, error: "microservice diagnostics only supports GET/HEAD" }, 405);
+  }
+  if (suffix === "tunnel-self-test" && method !== "GET" && method !== "HEAD") {
+    return jsonResponse({ ok: false, error: "microservice tunnel self-test only supports GET/HEAD" }, 405);
+  }
  if (!isMicroserviceMethodAllowed(service, method)) {
    return jsonResponse({ ok: false, error: "microservice method is not allowed", serviceId, method, allowedMethods: service.backend.allowedMethods }, 405);
  }
+  if (suffix === "diagnostics") {
+    if (!isK3sctlManagedMicroservice(service)) return strictMicroserviceHealthResponse(service, method === "HEAD");
+    return k3sctlManagedDiagnosticsResponse(service);
+  }
+  if (suffix === "tunnel-self-test") return microserviceTunnelSelfTestResponse(service);
  if (!isMicroservicePathAllowed(service, targetPath)) {
    return jsonResponse({ ok: false, error: "microservice path is not allowed", serviceId, targetPath }, 403);
  }
@@ -968,6 +1280,14 @@ export async function microserviceRoute(req: Request, url: URL): Promise<Respons
    }
  }
  const response = await fetchMicroserviceUpstreamResponse(service, method, targetPath, proxyOptions, requestHeaders, bodyText, req.signal);
+  if ((method === "GET" || method === "HEAD") && isMicroserviceTransientFailureResponse(response)) {
+    const stale = readStaleMicroserviceCache(cacheKey) ?? readMicroservicePathFallback(service, method, targetPath);
+    if (stale !== null) {
+      stale.headers.set("x-unidesk-cache", "stale-on-transient-failure");
+      stale.headers.set("x-unidesk-stale-reason", String(response.status));
+      return stale;
+    }
+  }
  if ((method === "GET" || method === "HEAD") && cacheTtlMs > 0) {
    const snapshot = await cacheableResponseSnapshot(response);
    rememberMicroserviceCache(cacheKey, cacheTtlMs, snapshot);
@@ -168,6 +168,13 @@ export interface EgressTcpConnection {
  provider: ProviderSocket;
 }

+export type HttpTunnelFailureReason =
+  | "timeout"
+  | "aborted"
+  | "provider-disconnected"
+  | "provider-mismatch"
+  | "send-failed";
+
 export interface MicroserviceProxyCacheEntry {
  expiresAt: number;
  staleExpiresAt: number;
@@ -193,6 +200,6 @@ export type HttpTunnelWaiter = (message: {
  requestId: string;
  ok: boolean;
  result: JsonValue;
-} | null) => void;
+} | null, reason?: HttpTunnelFailureReason) => void;

 export type LoggerFn = (level: "debug" | "info" | "warn" | "error", message: string, data?: JsonValue) => void;
@@ -0,0 +1,14 @@
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: unidesk-tekton-install
+  namespace: unidesk
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/part-of: unidesk
+data:
+  pipelineVersion: "v1.12.0"
+  triggersVersion: "v0.34.0"
+  pipelineReleaseUrl: "https://infra.tekton.dev/tekton-releases/pipeline/previous/v1.12.0/release.yaml"
+  triggersReleaseUrl: "https://infra.tekton.dev/tekton-releases/triggers/previous/v0.34.0/release.yaml"
+  triggersInterceptorsReleaseUrl: "https://infra.tekton.dev/tekton-releases/triggers/previous/v0.34.0/interceptors.yaml"
@@ -0,0 +1,593 @@
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: unidesk-ci
+  labels:
+    app.kubernetes.io/part-of: unidesk
+    unidesk.ai/purpose: ci
+---
+apiVersion: v1
+kind: ServiceAccount
+metadata:
+  name: unidesk-ci-runner
+  namespace: unidesk-ci
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/part-of: unidesk
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: unidesk-ci-runner
+  namespace: unidesk-ci
+rules:
+  - apiGroups: [""]
+    resources: ["pods", "pods/log", "services"]
+    verbs: ["get", "list", "watch", "create", "delete", "patch"]
+  - apiGroups: ["apps"]
+    resources: ["deployments"]
+    verbs: ["get", "list", "watch", "create", "delete", "patch"]
+  - apiGroups: ["tekton.dev"]
+    resources: ["pipelineruns", "taskruns"]
+    verbs: ["get", "list", "watch", "create", "delete", "patch"]
+  - apiGroups: ["triggers.tekton.dev"]
+    resources: ["eventlisteners", "triggers", "triggerbindings", "triggertemplates", "interceptors"]
+    verbs: ["get", "list", "watch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: unidesk-ci-runner
+  namespace: unidesk-ci
+subjects:
+  - kind: ServiceAccount
+    name: unidesk-ci-runner
+    namespace: unidesk-ci
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: Role
+  name: unidesk-ci-runner
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRole
+metadata:
+  name: unidesk-ci-trigger-reader
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/part-of: unidesk
+rules:
+  - apiGroups: ["triggers.tekton.dev"]
+    resources: ["clusterinterceptors", "clustertriggerbindings"]
+    verbs: ["get", "list", "watch"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: ClusterRoleBinding
+metadata:
+  name: unidesk-ci-trigger-reader
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/part-of: unidesk
+subjects:
+  - kind: ServiceAccount
+    name: unidesk-ci-runner
+    namespace: unidesk-ci
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: ClusterRole
+  name: unidesk-ci-trigger-reader
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: Role
+metadata:
+  name: unidesk-ci-cross-namespace
+  namespace: unidesk
+rules:
+  - apiGroups: [""]
+    resources: ["services"]
+    verbs: ["get", "list"]
+---
+apiVersion: rbac.authorization.k8s.io/v1
+kind: RoleBinding
+metadata:
+  name: unidesk-ci-cross-namespace
+  namespace: unidesk
+subjects:
+  - kind: ServiceAccount
+    name: unidesk-ci-runner
+    namespace: unidesk-ci
+roleRef:
+  apiGroup: rbac.authorization.k8s.io
+  kind: Role
+  name: unidesk-ci-cross-namespace
+---
+apiVersion: v1
+kind: PersistentVolumeClaim
+metadata:
+  name: unidesk-ci-cache
+  namespace: unidesk-ci
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/part-of: unidesk
+spec:
+  accessModes:
+    - ReadWriteOnce
+  resources:
+    requests:
+      storage: 20Gi
+---
+apiVersion: v1
+kind: ConfigMap
+metadata:
+  name: unidesk-ci-budgets
+  namespace: unidesk-ci
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/part-of: unidesk
+data:
+  firstPaintMs: "10000"
+  traceSummaryMs: "10000"
+  traceStepsMs: "20000"
+  traceStepDetailMs: "20000"
+  overviewP95Ms: "20000"
+---
+apiVersion: tekton.dev/v1
+kind: Task
+metadata:
+  name: unidesk-repo-check
+  namespace: unidesk-ci
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/component: repo-check
+spec:
+  params:
+    - name: repo-url
+      type: string
+    - name: revision
+      type: string
+    - name: image
+      type: string
+      default: unidesk-code-queue:d601
+  workspaces:
+    - name: source
+  volumes:
+    - name: docker-sock
+      hostPath:
+        path: /var/run/docker.sock
+        type: Socket
+  steps:
+    - name: clone
+      image: alpine/git:2.45.2
+      env:
+        - name: HTTP_PROXY
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: HTTPS_PROXY
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: ALL_PROXY
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: NO_PROXY
+          value: "localhost,127.0.0.1,::1,ci-git-mirror,ci-git-mirror.unidesk-ci,ci-git-mirror.unidesk-ci.svc,ci-git-mirror.unidesk-ci.svc.cluster.local,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,code-queue-ci-read,code-queue-ci-read.unidesk-ci,code-queue-ci-read.unidesk-ci.svc,code-queue-ci-read.unidesk-ci.svc.cluster.local"
+        - name: http_proxy
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: https_proxy
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: all_proxy
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: no_proxy
+          value: "localhost,127.0.0.1,::1,ci-git-mirror,ci-git-mirror.unidesk-ci,ci-git-mirror.unidesk-ci.svc,ci-git-mirror.unidesk-ci.svc.cluster.local,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,code-queue-ci-read,code-queue-ci-read.unidesk-ci,code-queue-ci-read.unidesk-ci.svc,code-queue-ci-read.unidesk-ci.svc.cluster.local"
+      script: |
+        #!/bin/sh
+        set -eu
+        rm -rf "$(workspaces.source.path)/repo"
+        git clone --filter=blob:none "$(params.repo-url)" "$(workspaces.source.path)/repo"
+        cd "$(workspaces.source.path)/repo"
+        git fetch --depth=1 origin "$(params.revision)"
+        git checkout --detach FETCH_HEAD
+        git rev-parse HEAD | tee "$(workspaces.source.path)/commit.txt"
+    - name: install-and-check
+      image: "$(params.image)"
+      env:
+        - name: DOCKER_HOST
+          value: unix:///var/run/docker.sock
+        - name: BUN_INSTALL_CACHE_DIR
+          value: "$(workspaces.source.path)/cache/bun"
+        - name: HTTP_PROXY
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: HTTPS_PROXY
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: ALL_PROXY
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: NO_PROXY
+          value: "localhost,127.0.0.1,::1,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,code-queue-ci-read,code-queue-ci-read.unidesk-ci,code-queue-ci-read.unidesk-ci.svc,code-queue-ci-read.unidesk-ci.svc.cluster.local"
+        - name: http_proxy
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: https_proxy
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: all_proxy
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: no_proxy
+          value: "localhost,127.0.0.1,::1,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,code-queue-ci-read,code-queue-ci-read.unidesk-ci,code-queue-ci-read.unidesk-ci.svc,code-queue-ci-read.unidesk-ci.svc.cluster.local"
+      volumeMounts:
+        - name: docker-sock
+          mountPath: /var/run/docker.sock
+      script: |
+        #!/usr/bin/env bash
+        set -euo pipefail
+        cd "$(workspaces.source.path)/repo"
+        command -v bun
+        command -v git
+        command -v docker
+        docker compose version >/dev/null
+        bun install --frozen-lockfile
+        (cd src && bun install --frozen-lockfile)
+        bun scripts/cli.ts check
+---
+apiVersion: tekton.dev/v1
+kind: Task
+metadata:
+  name: unidesk-code-queue-read-perf
+  namespace: unidesk-ci
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/component: code-queue-performance
+spec:
+  params:
+    - name: revision
+      type: string
+    - name: app-image
+      type: string
+      default: unidesk-code-queue:d601
+  workspaces:
+    - name: source
+  steps:
+    - name: start-read-service
+      image: "$(params.app-image)"
+      env:
+        - name: HTTP_PROXY
+          value: ""
+        - name: HTTPS_PROXY
+          value: ""
+        - name: ALL_PROXY
+          value: ""
+        - name: NO_PROXY
+          value: "localhost,127.0.0.1,::1,kubernetes,kubernetes.default,kubernetes.default.svc,kubernetes.default.svc.cluster.local"
+        - name: http_proxy
+          value: ""
+        - name: https_proxy
+          value: ""
+        - name: all_proxy
+          value: ""
+        - name: no_proxy
+          value: "localhost,127.0.0.1,::1,kubernetes,kubernetes.default,kubernetes.default.svc,kubernetes.default.svc.cluster.local"
+      script: |
+        #!/bin/bash
+        set -euo pipefail
+        kube_api="https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT_HTTPS}"
+        kube_token="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"
+        kube_ca="/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
+        kube_namespace="$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace)"
+        kube() {
+          local method="$1"
+          shift
+          curl -fsS --cacert "$kube_ca" -H "Authorization: Bearer $kube_token" -X "$method" "$@"
+        }
+        cat >/tmp/code-queue-ci-read-deployment.yaml <<YAML
+        apiVersion: apps/v1
+        kind: Deployment
+        metadata:
+          name: code-queue-ci-read
+          namespace: unidesk-ci
+          labels:
+            app.kubernetes.io/name: code-queue
+            app.kubernetes.io/component: ci-read
+            app.kubernetes.io/part-of: unidesk
+            unidesk.ai/ci-revision: "$(params.revision)"
+        spec:
+          replicas: 1
+          selector:
+            matchLabels:
+              app.kubernetes.io/name: code-queue
+              app.kubernetes.io/component: ci-read
+          template:
+            metadata:
+              labels:
+                app.kubernetes.io/name: code-queue
+                app.kubernetes.io/component: ci-read
+                app.kubernetes.io/part-of: unidesk
+                unidesk.ai/node-id: D601
+                unidesk.ai/ci-task-run: "$(context.taskRun.name)"
+            spec:
+              nodeSelector:
+                unidesk.ai/node-id: D601
+              terminationGracePeriodSeconds: 10
+              containers:
+                - name: code-queue
+                  image: "$(params.app-image)"
+                  imagePullPolicy: IfNotPresent
+                  ports:
+                    - name: http
+                      containerPort: 4222
+                  envFrom:
+                    - secretRef:
+                        name: code-queue-env
+                        optional: true
+                  env:
+                    - name: HOST
+                      value: "0.0.0.0"
+                    - name: PORT
+                      value: "4222"
+                    - name: DATABASE_URL
+                      value: "postgres://unidesk:unidesk_dev_password@d601-tcp-egress-gateway.unidesk.svc.cluster.local:15432/unidesk"
+                    - name: CODE_QUEUE_INSTANCE_ID
+                      value: "CI-read"
+                    - name: CODE_QUEUE_SERVICE_ROLE
+                      value: "read"
+                    - name: CODE_QUEUE_SCHEDULER_ENABLED
+                      value: "false"
+                    - name: CODE_QUEUE_STARTUP_OA_BACKFILL_ENABLED
+                      value: "false"
+                    - name: CODE_QUEUE_NOTIFY_CLAUDEQQ_ENABLED
+                      value: "false"
+                    - name: CODE_QUEUE_CODEX_SQLITE_LOG_EXPORT_ENABLED
+                      value: "false"
+                    - name: CODE_QUEUE_EGRESS_PROXY_ENABLED
+                      value: "true"
+                    - name: CODE_QUEUE_EGRESS_PROXY_URL
+                      value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+                    - name: CODE_QUEUE_EGRESS_PROXY_NO_PROXY
+                      value: "localhost,127.0.0.1,::1,code-queue-ci-read,code-queue-ci-read.unidesk-ci,code-queue-ci-read.unidesk-ci.svc,code-queue-ci-read.unidesk-ci.svc.cluster.local,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,backend-core,oa-event-flow,database,hyueapi.com,.hyueapi.com"
+                    - name: HTTP_PROXY
+                      value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+                    - name: HTTPS_PROXY
+                      value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+                    - name: ALL_PROXY
+                      value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+                    - name: NO_PROXY
+                      value: "localhost,127.0.0.1,::1,code-queue-ci-read,code-queue-ci-read.unidesk-ci,code-queue-ci-read.unidesk-ci.svc,code-queue-ci-read.unidesk-ci.svc.cluster.local,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,backend-core,oa-event-flow,database,hyueapi.com,.hyueapi.com"
+                    - name: http_proxy
+                      value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+                    - name: https_proxy
+                      value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+                    - name: all_proxy
+                      value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+                    - name: no_proxy
+                      value: "localhost,127.0.0.1,::1,code-queue-ci-read,code-queue-ci-read.unidesk-ci,code-queue-ci-read.unidesk-ci.svc,code-queue-ci-read.unidesk-ci.svc.cluster.local,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,backend-core,oa-event-flow,database,hyueapi.com,.hyueapi.com"
+                    - name: CODE_QUEUE_DATA_DIR
+                      value: "/var/lib/unidesk/code-queue-ci"
+                    - name: CODE_QUEUE_WORKDIR
+                      value: "/workspace"
+                    - name: CODE_QUEUE_CODEX_HOME
+                      value: "/var/lib/unidesk/code-queue-ci/codex-home"
+                    - name: CODE_QUEUE_OPENCODE_XDG_DIR
+                      value: "/var/lib/unidesk/code-queue-ci/opencode-xdg"
+                    - name: CODE_QUEUE_DEFAULT_MODEL
+                      value: "gpt-5.5"
+                    - name: CODE_QUEUE_MODELS
+                      value: "gpt-5.5,gpt-5.4-mini,gpt-5.4,minimax-m2.7"
+                    - name: CODE_QUEUE_DATABASE_POOL_MAX
+                      value: "4"
+                    - name: CODE_QUEUE_IN_MEMORY_OUTPUT_RECORDS
+                      value: "5"
+                    - name: CODE_QUEUE_IN_MEMORY_EVENT_RECORDS
+                      value: "5"
+                    - name: OA_EVENT_FLOW_BASE_URL
+                      value: "http://d601-tcp-egress-gateway.unidesk.svc.cluster.local:4255"
+                    - name: LOG_FILE
+                      value: "/var/log/unidesk/code-queue-ci-read.jsonl"
+                    - name: NODE_OPTIONS
+                      value: "--max-old-space-size=512"
+                  readinessProbe:
+                    httpGet:
+                      path: /live
+                      port: http
+                    periodSeconds: 5
+                    timeoutSeconds: 3
+                    failureThreshold: 20
+                  livenessProbe:
+                    httpGet:
+                      path: /live
+                      port: http
+                    periodSeconds: 10
+                    timeoutSeconds: 3
+                    failureThreshold: 6
+                  resources:
+                    requests:
+                      cpu: 100m
+                      memory: 256Mi
+                    limits:
+                      memory: 1Gi
+                  volumeMounts:
+                    - name: state
+                      mountPath: /var/lib/unidesk/code-queue-ci
+                    - name: logs
+                      mountPath: /var/log/unidesk
+              volumes:
+                - name: state
+                  emptyDir: {}
+                - name: logs
+                  emptyDir: {}
+        ---
+        apiVersion: v1
+        kind: Service
+        metadata:
+          name: code-queue-ci-read
+          namespace: unidesk-ci
+          labels:
+            app.kubernetes.io/name: code-queue
+            app.kubernetes.io/component: ci-read
+            app.kubernetes.io/part-of: unidesk
+        spec:
+          type: ClusterIP
+          selector:
+            app.kubernetes.io/name: code-queue
+            app.kubernetes.io/component: ci-read
+          ports:
+            - name: http
+              port: 4222
+              targetPort: http
+        YAML
+        csplit -s -f /tmp/code-queue-ci-read- /tmp/code-queue-ci-read-deployment.yaml '/^---$/' '{*}'
+        kube PATCH \
+          -H "Content-Type: application/apply-patch+yaml" \
+          --data-binary @/tmp/code-queue-ci-read-00 \
+          "$kube_api/apis/apps/v1/namespaces/$kube_namespace/deployments/code-queue-ci-read?fieldManager=unidesk-ci&force=true" >/dev/null
+        kube PATCH \
+          -H "Content-Type: application/apply-patch+yaml" \
+          --data-binary @/tmp/code-queue-ci-read-01 \
+          "$kube_api/api/v1/namespaces/$kube_namespace/services/code-queue-ci-read?fieldManager=unidesk-ci&force=true" >/dev/null
+        deadline=$((SECONDS + 180))
+        while [ "$SECONDS" -lt "$deadline" ]; do
+          status="$(kube GET "$kube_api/apis/apps/v1/namespaces/$kube_namespace/deployments/code-queue-ci-read")"
+          replicas="$(printf '%s' "$status" | jq -r '.spec.replicas // 1')"
+          available="$(printf '%s' "$status" | jq -r '.status.availableReplicas // 0')"
+          updated="$(printf '%s' "$status" | jq -r '.status.updatedReplicas // 0')"
+          observed="$(printf '%s' "$status" | jq -r '.status.observedGeneration // 0')"
+          generation="$(printf '%s' "$status" | jq -r '.metadata.generation // 0')"
+          if [ "$available" -ge "$replicas" ] && [ "$updated" -ge "$replicas" ] && [ "$observed" -ge "$generation" ]; then
+            echo "code_queue_ci_read_rollout=available replicas=$available generation=$generation"
+            exit 0
+          fi
+          sleep 2
+        done
+        echo "code_queue_ci_read_rollout=timeout" >&2
+        kube GET "$kube_api/apis/apps/v1/namespaces/$kube_namespace/deployments/code-queue-ci-read" >&2
+        exit 1
+    - name: measure
+      image: "$(params.app-image)"
+      workingDir: "$(workspaces.source.path)/repo"
+      env:
+        - name: CI_CODE_QUEUE_URL
+          value: "http://code-queue-ci-read.unidesk-ci.svc.cluster.local:4222"
+        - name: HTTP_PROXY
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: HTTPS_PROXY
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: ALL_PROXY
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: NO_PROXY
+          value: "localhost,127.0.0.1,::1,code-queue-ci-read,code-queue-ci-read.unidesk-ci,code-queue-ci-read.unidesk-ci.svc,code-queue-ci-read.unidesk-ci.svc.cluster.local,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local"
+        - name: http_proxy
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: https_proxy
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: all_proxy
+          value: "http://d601-provider-egress-proxy.unidesk.svc.cluster.local:18789"
+        - name: no_proxy
+          value: "localhost,127.0.0.1,::1,code-queue-ci-read,code-queue-ci-read.unidesk-ci,code-queue-ci-read.unidesk-ci.svc,code-queue-ci-read.unidesk-ci.svc.cluster.local,d601-tcp-egress-gateway,d601-tcp-egress-gateway.unidesk,d601-tcp-egress-gateway.unidesk.svc,d601-tcp-egress-gateway.unidesk.svc.cluster.local,d601-provider-egress-proxy,d601-provider-egress-proxy.unidesk,d601-provider-egress-proxy.unidesk.svc,d601-provider-egress-proxy.unidesk.svc.cluster.local"
+        - name: FIRST_PAINT_BUDGET_MS
+          valueFrom:
+            configMapKeyRef:
+              name: unidesk-ci-budgets
+              key: firstPaintMs
+        - name: TRACE_SUMMARY_BUDGET_MS
+          valueFrom:
+            configMapKeyRef:
+              name: unidesk-ci-budgets
+              key: traceSummaryMs
+        - name: TRACE_STEPS_BUDGET_MS
+          valueFrom:
+            configMapKeyRef:
+              name: unidesk-ci-budgets
+              key: traceStepsMs
+        - name: TRACE_STEP_DETAIL_BUDGET_MS
+          valueFrom:
+            configMapKeyRef:
+              name: unidesk-ci-budgets
+              key: traceStepDetailMs
+        - name: OVERVIEW_P95_BUDGET_MS
+          valueFrom:
+            configMapKeyRef:
+              name: unidesk-ci-budgets
+              key: overviewP95Ms
+      script: |
+        #!/usr/bin/env bash
+        set -euo pipefail
+        bun scripts/ci-code-queue-read-perf.ts
+    - name: cleanup
+      image: "$(params.app-image)"
+      env:
+        - name: HTTP_PROXY
+          value: ""
+        - name: HTTPS_PROXY
+          value: ""
+        - name: ALL_PROXY
+          value: ""
+        - name: NO_PROXY
+          value: "localhost,127.0.0.1,::1,kubernetes,kubernetes.default,kubernetes.default.svc,kubernetes.default.svc.cluster.local"
+        - name: http_proxy
+          value: ""
+        - name: https_proxy
+          value: ""
+        - name: all_proxy
+          value: ""
+        - name: no_proxy
+          value: "localhost,127.0.0.1,::1,kubernetes,kubernetes.default,kubernetes.default.svc,kubernetes.default.svc.cluster.local"
+      script: |
+        #!/bin/bash
+        set -euo pipefail
+        kube_api="https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT_HTTPS}"
+        kube_token="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)"
+        kube_ca="/var/run/secrets/kubernetes.io/serviceaccount/ca.crt"
+        kube_namespace="$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace)"
+        delete_resource() {
+          local path="$1"
+          local code
+          code="$(curl -sS -o /tmp/unidesk-ci-delete-response -w "%{http_code}" --cacert "$kube_ca" -H "Authorization: Bearer $kube_token" -X DELETE "$kube_api/$path")"
+          if [ "$code" = "200" ] || [ "$code" = "202" ] || [ "$code" = "404" ]; then
+            return 0
+          fi
+          cat /tmp/unidesk-ci-delete-response >&2
+          return 1
+        }
+        delete_resource "apis/apps/v1/namespaces/$kube_namespace/deployments/code-queue-ci-read"
+        delete_resource "api/v1/namespaces/$kube_namespace/services/code-queue-ci-read"
+---
+apiVersion: tekton.dev/v1
+kind: Pipeline
+metadata:
+  name: unidesk-ci
+  namespace: unidesk-ci
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/part-of: unidesk
+spec:
+  params:
+    - name: repo-url
+      type: string
+      default: https://github.com/pikasTech/unidesk
+    - name: revision
+      type: string
+    - name: check-image
+      type: string
+      default: unidesk-code-queue:d601
+    - name: code-queue-image
+      type: string
+      default: unidesk-code-queue:d601
+  workspaces:
+    - name: shared-workspace
+  tasks:
+    - name: repo-check
+      taskRef:
+        name: unidesk-repo-check
+      params:
+        - name: repo-url
+          value: "$(params.repo-url)"
+        - name: revision
+          value: "$(params.revision)"
+        - name: image
+          value: "$(params.check-image)"
+      workspaces:
+        - name: source
+          workspace: shared-workspace
+    - name: code-queue-read-perf
+      runAfter:
+        - repo-check
+      taskRef:
+        name: unidesk-code-queue-read-perf
+      params:
+        - name: revision
+          value: "$(params.revision)"
+        - name: app-image
+          value: "$(params.code-queue-image)"
+      workspaces:
+        - name: source
+          workspace: shared-workspace
@@ -0,0 +1,80 @@
+apiVersion: triggers.tekton.dev/v1beta1
+kind: EventListener
+metadata:
+  name: unidesk-ci
+  namespace: unidesk-ci
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/component: triggers
+    app.kubernetes.io/part-of: unidesk
+spec:
+  serviceAccountName: unidesk-ci-runner
+  triggers:
+    - name: github-push
+      interceptors:
+        - ref:
+            name: cel
+          params:
+            - name: filter
+              value: >-
+                body.ref.startsWith('refs/heads/') &&
+                body.after.matches('^[0-9a-f]{40}$')
+      bindings:
+        - ref: unidesk-ci-github-push
+      template:
+        ref: unidesk-ci
+---
+apiVersion: triggers.tekton.dev/v1beta1
+kind: TriggerBinding
+metadata:
+  name: unidesk-ci-github-push
+  namespace: unidesk-ci
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/component: triggers
+    app.kubernetes.io/part-of: unidesk
+spec:
+  params:
+    - name: repo-url
+      value: $(body.repository.clone_url)
+    - name: revision
+      value: $(body.after)
+---
+apiVersion: triggers.tekton.dev/v1beta1
+kind: TriggerTemplate
+metadata:
+  name: unidesk-ci
+  namespace: unidesk-ci
+  labels:
+    app.kubernetes.io/name: unidesk-ci
+    app.kubernetes.io/component: triggers
+    app.kubernetes.io/part-of: unidesk
+spec:
+  params:
+    - name: repo-url
+      default: https://github.com/pikasTech/unidesk
+    - name: revision
+  resourcetemplates:
+    - apiVersion: tekton.dev/v1
+      kind: PipelineRun
+      metadata:
+        generateName: unidesk-ci-
+        namespace: unidesk-ci
+        labels:
+          app.kubernetes.io/name: unidesk-ci
+          app.kubernetes.io/part-of: unidesk
+          unidesk.ai/revision: $(tt.params.revision)
+      spec:
+        pipelineRef:
+          name: unidesk-ci
+        taskRunTemplate:
+          serviceAccountName: unidesk-ci-runner
+        params:
+          - name: repo-url
+            value: $(tt.params.repo-url)
+          - name: revision
+            value: $(tt.params.revision)
+        workspaces:
+          - name: shared-workspace
+            persistentVolumeClaim:
+              claimName: unidesk-ci-cache
@@ -723,6 +723,30 @@ function parseCurlHeaderBody(output: Buffer): { status: number; contentType: str
  return { status: Number.isFinite(status) ? status : 502, contentType, bodyText };
 }

+function bodyPreview(bodyText: string, contentType: string): JsonValue {
+  if (contentType.toLowerCase().includes("json")) {
+    try {
+      return JSON.parse(bodyText) as JsonValue;
+    } catch {
+      return bodyText.slice(0, 2000);
+    }
+  }
+  return bodyText.slice(0, 2000);
+}
+
+function responseProbeRecord(response: Response, bodyText: string, startedAt: number): JsonRecord {
+  const contentType = response.headers.get("content-type") ?? "application/octet-stream";
+  return {
+    ok: response.ok,
+    status: response.ok ? "healthy" : "unhealthy",
+    upstreamStatus: response.status,
+    contentType,
+    proxyMode: response.headers.get("x-unidesk-proxy-mode") ?? "",
+    durationMs: Date.now() - startedAt,
+    body: bodyPreview(bodyText, contentType),
+  };
+}
+
 async function kubeApiServiceProxyResponse(
  service: ManagedService,
  req: Request,
@@ -733,6 +757,25 @@ async function kubeApiServiceProxyResponse(
  return kubeApiProxyResponse(service, req, serviceProxyApiPath(service, targetPath), query, timeoutMs);
 }

+async function kubeApiServiceProxyProbe(service: ManagedService, targetPath: string, timeoutMs: number): Promise<JsonRecord> {
+  const startedAt = Date.now();
+  try {
+    const request = new Request("http://k3sctl-adapter.local/diagnostics", { method: "GET", headers: { accept: "application/json" } });
+    const response = await kubeApiServiceProxyResponse(service, request, targetPath, "", timeoutMs);
+    const bodyText = await response.text();
+    return responseProbeRecord(response, bodyText, startedAt);
+  } catch (error) {
+    return {
+      ok: false,
+      status: "unhealthy",
+      upstreamStatus: null,
+      proxyMode: "kubernetes-api-service-proxy",
+      durationMs: Date.now() - startedAt,
+      error: errorToJson(error),
+    };
+  }
+}
+
 async function nativeServiceProxyResponse(
  service: ManagedService,
  req: Request,
@@ -1116,6 +1159,74 @@ async function controlPlaneSnapshot(): Promise<JsonRecord> {
  };
 }

+async function serviceDiagnostics(service: ManagedService): Promise<JsonRecord> {
+  const checkedAt = new Date().toISOString();
+  const healthPath = activeEndpoint(service).healthPath;
+  const healthTimeoutMs = Math.max(500, Math.min(config.healthTimeoutMs, 5000));
+  const kubernetesApiServiceProxy = await kubeApiServiceProxyProbe(service, healthPath, healthTimeoutMs);
+  const targetServiceStartedAt = Date.now();
+  let targetService: JsonRecord;
+  try {
+    const nativeRequest = new Request("http://k3sctl-adapter.local/diagnostics", { method: "GET", headers: { accept: "application/json" } });
+    const native = await nativeServiceProxyResponse(service, nativeRequest.clone(), healthPath, "", healthTimeoutMs);
+    const response = native ?? await kubeApiServiceProxyResponse(service, nativeRequest, healthPath, "", healthTimeoutMs);
+    const bodyText = await response.text();
+    targetService = responseProbeRecord(response, bodyText, targetServiceStartedAt);
+  } catch (error) {
+    targetService = {
+      ok: false,
+      status: "unhealthy",
+      upstreamStatus: null,
+      proxyMode: "k3sctl-service-health",
+      durationMs: Date.now() - targetServiceStartedAt,
+      error: errorToJson(error),
+    };
+  }
+  const managedService = await serviceStatus(service).then((status) => ({
+    ok: status.healthy === true,
+    status: String(status.status ?? ""),
+    servingHealthy: status.servingHealthy === true,
+    topologyComplete: status.topologyComplete === true,
+    activeInstanceId: String(status.activeInstanceId ?? ""),
+    active: status.active ?? null,
+    missingNodeIds: Array.isArray(status.missingNodeIds) ? status.missingNodeIds as JsonValue : [],
+  } satisfies JsonRecord)).catch((error) => ({
+    ok: false,
+    status: "unhealthy",
+    error: errorToJson(error),
+  } satisfies JsonRecord));
+  const kubernetesApiServiceProxyOk = kubernetesApiServiceProxy.ok === true;
+  const targetServiceOk = targetService.ok === true;
+  const checks = {
+    k3sctlAdapter: {
+      ok: true,
+      nodeId: config.nodeId,
+      clusterId: config.clusterId,
+      startedAt,
+    },
+    kubernetesApiServiceProxy: {
+      ...kubernetesApiServiceProxy,
+      configured: kubeClient !== null,
+      kubeconfigPath: config.kubeconfigPath,
+      connectHost: config.kubeApiConnectHost,
+    },
+    targetService,
+    managedService,
+  } satisfies Record<string, JsonValue>;
+  const ok = kubernetesApiServiceProxyOk && targetServiceOk;
+  return {
+    ok,
+    service: "k3sctl-adapter",
+    serviceId: service.id,
+    namespace: service.namespace,
+    checkedAt,
+    healthPath,
+    route: service.route,
+    noFallback: true,
+    checks,
+  };
+}
+
 function forwardHeaders(request: Request): Headers {
  const headers = new Headers();
  for (const name of ["accept", "content-type", "x-requested-with"]) {
@@ -1165,6 +1276,13 @@ async function route(req: Request): Promise<Response> {
      const status = await serviceStatus(service);
      return req.method === "HEAD" ? new Response(null, { status: status.healthy === true ? 200 : 503 }) : jsonResponse({ ok: status.healthy === true, managedService: status }, status.healthy === true ? 200 : 503);
    }
+    const diagnosticsMatch = url.pathname.match(/^\/api\/services\/([^/]+)\/diagnostics$/u);
+    if (diagnosticsMatch !== null && (req.method === "GET" || req.method === "HEAD")) {
+      const service = serviceById(decodeURIComponent(diagnosticsMatch[1] ?? ""));
+      if (service === null) return jsonResponse({ ok: false, error: "managed service not found" }, 404);
+      const diagnostics = await serviceDiagnostics(service);
+      return req.method === "HEAD" ? new Response(null, { status: diagnostics.ok === true ? 200 : 503 }) : jsonResponse(diagnostics, diagnostics.ok === true ? 200 : 503);
+    }
    const proxyMatch = url.pathname.match(/^\/api\/services\/([^/]+)\/proxy(\/.*)$/u);
    if (proxyMatch !== null) {
      const service = serviceById(decodeURIComponent(proxyMatch[1] ?? ""));