fix: harden provider gateway upgrades
This commit is contained in:
@@ -18,8 +18,8 @@ UniDesk delivery is not complete until the public frontend, public provider ingr
|
||||
- Provider remote control: internal `/api/dispatch` must successfully complete a real `provider.upgrade` task in `mode: "plan"` so the upgrade path is validated without recreating the running gateway during E2E.
|
||||
- Microservices: internal `/api/microservices` must include `todo-note` on `main-server` plus `findjob`, `pipeline` and `met-nonlinear` on `D601` with `public=false`; `/api/microservices/todo-note/health` must report `storage=postgres`, `/api/microservices/todo-note/proxy/api/instances` must expose the migrated Todo Note lists, and a temporary Todo Note list create/add/toggle/undo/delete cycle must succeed through the real provider-gateway proxy; `/api/microservices/findjob/health` and `/api/microservices/findjob/proxy/api/summary` must succeed through the real provider-gateway proxy; `/api/microservices/findjob/proxy/api/jobs?__unideskArrayLimit=jobs:5` must return a bounded preview with `_unidesk.arrayLimits` metadata; `/api/microservices/pipeline/health` and `/api/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3` must return Pipeline health, registry and run previews; `/api/microservices/met-nonlinear/health`, `/api/microservices/met-nonlinear/proxy/api/queue`, `/api/microservices/met-nonlinear/proxy/api/projects?root=projects&limit=20` and `/api/microservices/met-nonlinear/proxy/api/images` must return the D601 TS backend health, queue/GPU policy, project preview and ready `met-nonlinear-ml:tf26` image status.
|
||||
- Database: the command writes an `unidesk_e2e_markers` row through `docker exec unidesk-database psql`, confirms provider state is stored in PostgreSQL, and checks Todo Note rows exist in `todo_note_instances` using the same named volume.
|
||||
- Frontend: Playwright must open the public frontend URL derived from `network.publicHost`, not localhost or a Docker-internal URL; it logs in with the configured account, waits for `核心在线`, asserts that `main-server` and `Main Server Provider` are visible, verifies desktop sidebar collapse and `PGDATA` overview metric, clicks `查看原始JSON` to verify Provider data from the frontend, confirms no raw JSON is visible before that click, opens task history to verify duration and failure diagnostics, opens resource nodes `资源监控` to verify CPU/Memory/Disk curves and provider upgrade precheck dispatch, opens `Docker 状态`, switches to `main-server`, and verifies the Docker Desktop-style container view including the database named volume `unidesk_pgdata_10gb`, opens `网关版本` and verifies the provider-gateway version, SSH 透传可用性、远程更新可用性 plus structured automatic update records for `provider.upgrade`, then opens `微服务 / 服务目录`、`微服务 / Todo Note`、`微服务 / FindJob`、`微服务 / Pipeline` and `微服务 / MET Nonlinear` to verify 主 server Todo Note、D601、仓库引用、私有后端映射、Todo Note 迁移清单和树形任务、FindJob 指标和岗位预览、Pipeline 组件矩阵、React Flow 控制图和最近运行、MET Nonlinear 队列/GPU/镜像/Project config/训练历史都通过 React 控件展示。
|
||||
- Microservice frontend assertions must wait for real backend data, not only the page skeleton. For Todo Note this means the page must show the migrated lists `CONSTAR`、`大论文`、`找工作`、`小论文`、`事务`, support creating a temporary list and task through the frontend, and delete that temporary list afterwards. The temporary list must be selected again by its unique generated name before deletion so E2E never deletes a migrated source list by accident. For FindJob this means the page must show a numeric `岗位总量`, `HEALTH OK`, and a non-empty `PREVIEW` count such as `40/1463 PREVIEW`; for Pipeline this means the page must show `Pipeline v2 工作台`, `Health OK`, a numeric component count, a non-empty React Flow control graph, `控制图`, and `最近运行`; for MET Nonlinear this means the page must show `MET Nonlinear 训练编排`, `Health OK`, `创建10个10轮任务`, task queue and GPU/image panels; loading placeholders like `--` or empty states are not sufficient for E2E success.
|
||||
- Frontend: Playwright must open the public frontend URL derived from `network.publicHost`, not localhost or a Docker-internal URL; it logs in with the configured account, waits for `核心在线`, asserts that `main-server` and `Main Server Provider` are visible, verifies desktop sidebar collapse and `PGDATA` overview metric, clicks `查看原始JSON` to verify Provider data from the frontend, confirms no raw JSON is visible before that click, opens task history to verify duration and failure diagnostics, opens resource nodes `资源监控` to verify CPU/Memory/Disk curves and provider upgrade precheck dispatch, opens `Docker 状态`, switches to `main-server`, and verifies the Docker Desktop-style container view including the database named volume `unidesk_pgdata_10gb`, opens `网关版本` and verifies the provider-gateway version, SSH 透传可用性、远程更新可用性 plus structured automatic update records for `provider.upgrade`, then opens `微服务 / 服务目录`、`微服务 / Todo Note`、`微服务 / FindJob`、`微服务 / Pipeline` and `微服务 / MET Nonlinear` to verify 主 server Todo Note、D601、仓库引用、私有后端映射、Todo Note 迁移清单和树形任务、FindJob 指标和岗位预览、Pipeline 组件矩阵、React Flow 控制图和最近运行、MET Nonlinear 项目库/Fork/待启动队列/当前队列/已完成/失败诊断/GPU/镜像都通过 React 控件展示。
|
||||
- Microservice frontend assertions must wait for real backend data, not only the page skeleton. For Todo Note this means the page must show the migrated lists `CONSTAR`、`大论文`、`找工作`、`小论文`、`事务`, support creating a temporary list and task through the frontend, and delete that temporary list afterwards. The temporary list must be selected again by its unique generated name before deletion so E2E never deletes a migrated source list by accident. For FindJob this means the page must show a numeric `岗位总量`, `HEALTH OK`, and a non-empty `PREVIEW` count such as `40/1463 PREVIEW`; for Pipeline this means the page must show `Pipeline v2 工作台`, `Health OK`, a numeric component count, a non-empty React Flow control graph, `控制图`, and `最近运行`; for MET Nonlinear this means the page must show `MET Nonlinear 训练编排`, `Health OK`, `Fork Project`, `加入待启动队列`, `启动队列`, `当前队列`, 最大并发设置、task queue and GPU/image panels, and must not show the removed hard-coded `创建10个10轮任务` frontend entry; loading placeholders like `--` or empty states are not sufficient for E2E success.
|
||||
|
||||
## Frontend JSON Rule
|
||||
|
||||
@@ -51,6 +51,6 @@ Before claiming delivery, run these checks and keep their JSON output or screens
|
||||
|
||||
## Provider Upgrade Gate
|
||||
|
||||
When delivery explicitly includes upgrading or rebuilding a compute-node `provider-gateway` such as D601 or D518, the automated E2E plan check is not sufficient. The operator must first bootstrap any legacy provider only from a node-local terminal, node-owned web terminal, systemd, scheduled task, or detached shell if it cannot yet schedule upgrades; SSH passthrough carried by the same provider-gateway must not be used for synchronous self-rebuilds. Then run `provider.upgrade` with `mode: "schedule"` against that Provider ID, confirm the task succeeds, confirm the node reconnects in the public frontend, and finally verify any required `host.ssh` capability with `bun scripts/cli.ts ssh <PROVIDER_ID> hostname`. This schedule check is a node-upgrade gate, not a replacement for the standard public frontend Playwright E2E gate.
|
||||
When delivery explicitly includes upgrading or rebuilding a compute-node `provider-gateway` such as D601 or D518, the automated E2E plan check is not sufficient. The operator must first bootstrap any legacy provider only from a node-local terminal, node-owned web terminal, systemd, scheduled task, or detached shell if it cannot yet schedule upgrades; SSH passthrough carried by the same provider-gateway must not be used for synchronous self-rebuilds. Then run `provider.upgrade` with `mode: "schedule"` against that Provider ID, confirm the task succeeds, confirm the sleep-and-validate candidate gateway reconnects in the public frontend, confirm the final container restart policy is `always`, and finally verify any required `host.ssh` capability with `bun scripts/cli.ts ssh <PROVIDER_ID> hostname`. This schedule check is a node-upgrade gate, not a replacement for the standard public frontend Playwright E2E gate.
|
||||
|
||||
External compute nodes should run that schedule check through the remote main-server passthrough form: `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> provider.upgrade --mode schedule --wait-ms 15000`. The default remote transport logs in to the public frontend and does not require a main server SSH key; this proves the node can validate itself without direct access to backend-core REST or PostgreSQL.
|
||||
|
||||
Reference in New Issue
Block a user