feat: harden code queue runtime UX
- add queue merge controls and no-cache overview loading\n- improve runtime probes, judge retries, and task trace rendering\n- extend CLI/E2E/docs coverage for code queue and user services
This commit is contained in:
+10
-2
@@ -35,7 +35,8 @@ Typical targeted commands:
|
||||
- Core API: `docker exec unidesk-backend-core` calls internal `GET /api/overview`, which must report `dbReady: true`, `pgdata.volumeName=unidesk_pgdata_10gb`, a positive PostgreSQL database byte count, and at least one online node; internal `GET /api/performance` must report component request statistics, internal operation statistics, PGDATA usage and Code Queue PostgreSQL storage metadata.
|
||||
- Provider self-connection: internal `GET /api/nodes` must contain `main-server` with `status: online`, `labels.providerGatewayVersion` equal to `src/components/provider-gateway/package.json`, `labels.providerGatewayUpgradePolicy: "always-enabled"`, `labels.providerGatewayRestartPolicyOk: true`, `labels.providerGatewayPidModeOk: true`, and `labels.providerGatewayRuntimeGuardOk: true`; internal `GET /api/nodes/system-status` must contain CPU/memory/disk samples plus a non-empty process resource list sorted by memory by default; internal `GET /api/nodes/docker-status` must contain a Docker snapshot for `main-server`; every running `provider-gateway` container visible in Docker snapshots must report `restartPolicy: "always"` and `pidMode: "host"`; public provider ingress `/health` must return ok.
|
||||
- Provider remote control: internal `/api/dispatch` must successfully complete a real `provider.upgrade` task in `mode: "plan"` so the upgrade path is validated without recreating the running gateway during E2E.
|
||||
- User services: internal `/api/microservices` must include `todo-note` and `code-queue` on `main-server`, canonical `filebrowser` on `D518`, plus `findjob`, `pipeline`, `met-nonlinear` and `filebrowser-d601` on `D601` with `public=false`; `/api/microservices/todo-note/health` must report `storage=postgres`, `/api/microservices/todo-note/proxy/api/instances` must expose the migrated Todo Note lists, and a temporary Todo Note list create/add/toggle/undo/delete cycle must succeed through the real provider-gateway proxy; `/api/microservices/code-queue/health` must return a Code Queue summary with default model `gpt-5.5`, and `/api/microservices/code-queue/proxy/api/tasks` must return queue state through the same proxy; `/api/microservices/filebrowser/health`, `/api/microservices/filebrowser-d601/health` and `/api/microservices/filebrowser/proxy/` must prove File Browser health and WebUI access through UniDesk proxy; `/api/microservices/findjob/health` and `/api/microservices/findjob/proxy/api/summary` must succeed through the real provider-gateway proxy; `/api/microservices/findjob/proxy/api/jobs?__unideskArrayLimit=jobs:5` must return a bounded preview with `_unidesk.arrayLimits` metadata; `/api/microservices/pipeline/health`, `/api/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3` and `/api/microservices/pipeline/proxy/api/oa-event-flow/diagnostics` must return Pipeline health, registry/run previews and OA event-flow evidence; `/api/microservices/met-nonlinear/health`, `/api/microservices/met-nonlinear/proxy/api/queue`, `/api/microservices/met-nonlinear/proxy/api/projects?root=projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects?root=ex_projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects/config?path=<projectPath>` and `/api/microservices/met-nonlinear/proxy/api/images` must return the D601 TS backend health, queue/GPU policy, full project tree inputs, structured project detail and ready `met-nonlinear-ml:tf26` image status.
|
||||
- User services: internal `/api/microservices` must include `todo-note` and `code-queue` on `main-server`, canonical `filebrowser` on `D518`, plus `findjob`, `pipeline`, `met-nonlinear`, `claudeqq` and `filebrowser-d601` on `D601` with `public=false`; `/api/microservices/todo-note/health` must report `storage=postgres`, `/api/microservices/todo-note/proxy/api/instances` must expose the migrated Todo Note lists, and a temporary Todo Note list create/add/toggle/undo/delete cycle must succeed through the real provider-gateway proxy; `/api/microservices/code-queue/health` must return a Code Queue summary with default model `gpt-5.5`, and `/api/microservices/code-queue/proxy/api/tasks` must return queue state through the same proxy; `/api/microservices/filebrowser/health`, `/api/microservices/filebrowser-d601/health` and `/api/microservices/filebrowser/proxy/` must prove File Browser health and WebUI access through UniDesk proxy; `/api/microservices/findjob/health` and `/api/microservices/findjob/proxy/api/summary` must succeed through the real provider-gateway proxy; `/api/microservices/findjob/proxy/api/jobs?__unideskArrayLimit=jobs:5` must return a bounded preview with `_unidesk.arrayLimits` metadata; `/api/microservices/pipeline/health`, `/api/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3` and `/api/microservices/pipeline/proxy/api/oa-event-flow/diagnostics` must return Pipeline health, registry/run previews and OA event-flow evidence; `/api/microservices/met-nonlinear/health`, `/api/microservices/met-nonlinear/proxy/api/queue`, `/api/microservices/met-nonlinear/proxy/api/projects?root=projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects?root=ex_projects&limit=500`, `/api/microservices/met-nonlinear/proxy/api/projects/config?path=<projectPath>` and `/api/microservices/met-nonlinear/proxy/api/images` must return the D601 TS backend health, queue/GPU policy, full project tree inputs, structured project detail and ready `met-nonlinear-ml:tf26` image status.
|
||||
- ClaudeQQ availability: `/api/microservices/claudeqq/health` must only pass when `ready=true`, NapCat HTTP and WebSocket are connected, and `napcat.loginState=logged_in`; `/api/microservices/claudeqq/proxy/api/napcat/login` must show the same logged-in account state and `/api/microservices/claudeqq/proxy/api/events/recent` must prove the backend can read the persistent event cache. A QR-code-only or not-logged-in NapCat state must be treated as unhealthy.
|
||||
- Database: the command writes an `unidesk_e2e_markers` row through `docker exec unidesk-database psql`, confirms provider state is stored in PostgreSQL, and checks Todo Note rows exist in `todo_note_instances` using the same named volume.
|
||||
- Pipeline OA event flow: `microservice:pipeline-oa-event-flow` must prove both no-audit and monitor-audit runs are driven by OA events end to end. The event stream must show `node-finished` as a neutral fact with `pipeline:{pipelineId}` and `epoch:{runId}` tags, OA policy as the source of downstream/audit decisions, monitor decisions as OA control events, and runner control-result evidence. E2E must fail if delivery still depends on a legacy detail audit policy flag as policy authority, independent legacy audit-request points, a legacy batch completion gate, direct monitor-to-runner calls, or frontend/CLI writes to Pipeline `.state`.
|
||||
- The same Pipeline OA diagnostics must fail on legacy file-transport residuals. Procedure containers, monitor sessions, UI/Gantt DTO builders and CLI fetches must consume prompt/control/stop/display evidence only from the OA event ledger and normalized HTTP read APIs; `control-prompts.jsonl`, `monitor-prompts.jsonl`, `monitor-control`, `control-events.jsonl`, monitor stop files, `.state/pipeline-runs/{runId}/control/commands/`, `PIPELINE_*_APPEND_FILE`, local JSONL append/read helpers, and monitor `/pipeline-state` mounts are forbidden in runtime source.
|
||||
@@ -44,6 +45,7 @@ Typical targeted commands:
|
||||
- Frontend dense-layout regression gate: whenever a frontend change touches Pipeline 右侧边栏、Trace timeline、详情抽屉、甘特图坐标或其他高信息密度面板, Playwright acceptance must inspect both `总高度` and `横向滚动条`. For Pipeline specifically, the OpenCode Trace session head must carry shared agent/model/session facts and the Trace body must use the same Code Queue `TraceView` styling; Playwright must fail if old `.pipeline-opencode-step`, `.pipeline-opencode-flow`, `.pipeline-step-message-card` or `.pipeline-opencode-part` user-visible styles reappear, if the Trace container introduces an internal horizontal scrollbar, or if `frontend:pipeline-gantt-frontend-y-accuracy` fails to prove the frontend `frontend-y` layout maps ticks, markers and execution bars from timestamps to y coordinates within tolerance.
|
||||
- OpenCode Trace must use Code Queue Trace styling and must not render the deprecated Pipeline continuous step connector; Playwright should fail if `.pipeline-opencode-flow`, `.pipeline-opencode-step` or any equivalent continuous connector/card returns to the user-visible Trace.
|
||||
- User service frontend assertions must wait for real backend data, not only the page skeleton. For Todo Note this means the page must show the migrated lists `CONSTAR`、`大论文`、`找工作`、`小论文`、`事务`, support creating a temporary list and task through the frontend, and delete that temporary list afterwards. The temporary list must be selected again by its unique generated name before deletion so E2E never deletes a migrated source list by accident. For FindJob this means the page must show a numeric `岗位总量`, `HEALTH OK`, and a non-empty `PREVIEW` count such as `40/1463 PREVIEW`; for Pipeline this means the page must show `Pipeline v2 工作台`, `Health OK`, a numeric component count, a non-empty React Flow control graph, `控制图`, `Epoch 甘特图`, and after clicking a Gantt execution line it must show `OpenCode Trace` rendered by the shared Code Queue-style Trace component with messages and tool-call groups; for MET Nonlinear this means the page must show `MET Nonlinear 训练编排`, `Health OK`, `Fork Project`, `加入待启动队列`, `启动队列`, `当前队列`, 最大并发设置、task queue and GPU/image panels, and must not show the removed hard-coded `创建10个10轮任务` frontend entry. The MET Nonlinear project library must render `projects/` and `ex_projects/` as a true path tree with folder Project counts; clicking a project row must open a structured detail panel containing `config.json`, `data/ 训练状态`, `模型参数`, `指标` and a parameter count such as `Total Params`; clicking a completed/current/failed job row must open a structured job detail and both the row and detail must show `epoch/h`. Full MET Nonlinear acceptance is driven by public frontend controls: choose a visible source Project, set batch size, epochs and max concurrency in inputs, fork into `projects/unidesk_forks/`, stage the selected forks, start the queue, and verify completed rows plus automatic `metnl-train-*` container removal; loading placeholders like `--` or empty states are not sufficient for E2E success.
|
||||
- For ClaudeQQ this means the page must show `Health OK`, `NapCat 容器登录`, `NAPCAT HTTP OK`, `NAPCAT WS OK`, logged-in state such as `已登录 logged_in`, event cache, subscriptions and message push controls. A page that only shows a QR code, stale raw JSON, or a running backend without logged-in NapCat is not acceptable.
|
||||
|
||||
## Frontend JSON Rule
|
||||
|
||||
@@ -53,7 +55,7 @@ Remote update records in the frontend are covered by the same rule: `provider.up
|
||||
|
||||
Provider operation availability is also covered by the structured rendering rule. `host.ssh` availability must be displayed as badges or equivalent controls derived from capabilities and `hostSsh*` labels, and remote update availability must be displayed from `provider.upgrade` capability plus the `always-enabled` policy; these fields must not require opening raw Provider JSON.
|
||||
|
||||
User service pages are covered by the same rule. `Todo Note` must show lists, task tree, filters, reminder input, movement controls, undo/redo and metrics as controls; `Code Queue` must show queue cards, live transcript, model/cwd/max attempt inputs, judge decision, attempt table, append prompt, interrupt and retry controls; `File Browser` must show D518 as the default target, D601 as an alternate target, a screenshot export action, and an embedded upstream WebUI frame served through `/api/microservices/<id>/proxy/` with compact file rows that do not let material-icon fallback text cover file metadata; `FindJob` must show metrics, jobs and drafts as cards/tables; `Pipeline` must show component classes, React Flow graph nodes/edges, run cards, Gantt execution lines and OpenCode Trace timelines as controls; `MET Nonlinear` must show queue rows, GPU/image cards, a real path tree for the project library, structured project/job detail panels, project config preview, `data/` training state, model parameter count, metrics, progress bars, ETA, `epoch/h` speed and history diagnostics as controls; the full user-service config, summary, snapshot, jobs preview, drafts and run JSON can only appear after an explicit `查看原始JSON` click.
|
||||
User service pages are covered by the same rule. `Todo Note` must show lists, task tree, filters, reminder input, movement controls, undo/redo and metrics as controls; `Code Queue` must show queue cards, live transcript, model/cwd/max attempt inputs, judge decision, attempt table, append prompt, interrupt and retry controls; `File Browser` must show D518 as the default target, D601 as an alternate target, a screenshot export action, and an embedded upstream WebUI frame served through `/api/microservices/<id>/proxy/` with compact file rows that do not let material-icon fallback text cover file metadata; `FindJob` must show metrics, jobs and drafts as cards/tables; `Pipeline` must show component classes, React Flow graph nodes/edges, run cards, Gantt execution lines and OpenCode Trace timelines as controls; `MET Nonlinear` must show queue rows, GPU/image cards, a real path tree for the project library, structured project/job detail panels, project config preview, `data/` training state, model parameter count, metrics, progress bars, ETA, `epoch/h` speed and history diagnostics as controls; `ClaudeQQ` must show NapCat HTTP/WS/login badges, QR/login panel, event cache, subscriptions and message push controls; the full user-service config, summary, snapshot, jobs preview, drafts and run JSON can only appear after an explicit `查看原始JSON` click.
|
||||
|
||||
## Public Boundary Rule
|
||||
|
||||
@@ -63,6 +65,12 @@ The public frontend URL and provider ingress URL are the only public network int
|
||||
|
||||
The PostgreSQL data volume is the named Docker volume `unidesk_pgdata_10gb`. CLI server control commands must never use `docker compose down -v`, `docker volume rm`, or any equivalent data-volume removal. To validate persistence, insert a marker row into `unidesk_e2e_markers`, run `bun scripts/cli.ts server start` or a full stop/start cycle, and verify the marker row still exists.
|
||||
|
||||
## User Service Restart-Recovery Rule
|
||||
|
||||
Any new user service, service migration, or change to a service's Compose/docker run configuration must prove it can recover after container restart and Docker daemon restart. The delivery evidence must include the service's `config.json` id/provider/container mapping, restart policy, host-bound private port, persistent mounts or PostgreSQL tables, health readiness fields, and at least one post-restart `bun scripts/cli.ts microservice health <id>` plus a representative `microservice proxy` check through the real provider-gateway path.
|
||||
|
||||
D601 services have an extra gate because Windows, WSL and Docker Desktop are separate supervisors: record the Windows scheduled task or equivalent keepalive, run `docker inspect` to confirm `met-nonlinear-ts`, `claudeqq-backend`, `claudeqq-napcat` and any changed service have non-empty restart policies and host bind mounts for durable state, then verify MET Nonlinear queue/image health and ClaudeQQ logged-in NapCat HTTP/WebSocket state after the restart. A service that only becomes `running` but loses login, queue, token, subscription, data directory or pending work is not restart-recovery complete.
|
||||
|
||||
## Delivery Gate
|
||||
|
||||
Before claiming delivery, run these checks and keep their JSON output or screenshot path available for review:
|
||||
|
||||
Reference in New Issue
Block a user