feat: finish met nonlinear ui workflow

This commit is contained in:
Codex
2026-05-06 01:54:51 +00:00
parent 2dc46ce056
commit 17fb816d33
8 changed files with 42 additions and 16 deletions
+2 -2
View File
@@ -18,8 +18,8 @@ UniDesk delivery is not complete until the public frontend, public provider ingr
- Provider remote control: internal `/api/dispatch` must successfully complete a real `provider.upgrade` task in `mode: "plan"` so the upgrade path is validated without recreating the running gateway during E2E.
- Microservices: internal `/api/microservices` must include `todo-note` on `main-server` plus `findjob`, `pipeline` and `met-nonlinear` on `D601` with `public=false`; `/api/microservices/todo-note/health` must report `storage=postgres`, `/api/microservices/todo-note/proxy/api/instances` must expose the migrated Todo Note lists, and a temporary Todo Note list create/add/toggle/undo/delete cycle must succeed through the real provider-gateway proxy; `/api/microservices/findjob/health` and `/api/microservices/findjob/proxy/api/summary` must succeed through the real provider-gateway proxy; `/api/microservices/findjob/proxy/api/jobs?__unideskArrayLimit=jobs:5` must return a bounded preview with `_unidesk.arrayLimits` metadata; `/api/microservices/pipeline/health` and `/api/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3` must return Pipeline health, registry and run previews; `/api/microservices/met-nonlinear/health`, `/api/microservices/met-nonlinear/proxy/api/queue`, `/api/microservices/met-nonlinear/proxy/api/projects?root=projects&limit=20` and `/api/microservices/met-nonlinear/proxy/api/images` must return the D601 TS backend health, queue/GPU policy, project preview and ready `met-nonlinear-ml:tf26` image status.
- Database: the command writes an `unidesk_e2e_markers` row through `docker exec unidesk-database psql`, confirms provider state is stored in PostgreSQL, and checks Todo Note rows exist in `todo_note_instances` using the same named volume.
- Frontend: Playwright must open the public frontend URL derived from `network.publicHost`, not localhost or a Docker-internal URL; it logs in with the configured account, waits for `核心在线`, asserts that `main-server` and `Main Server Provider` are visible, verifies desktop sidebar collapse and `PGDATA` overview metric, clicks `查看原始JSON` to verify Provider data from the frontend, confirms no raw JSON is visible before that click, opens task history to verify duration and failure diagnostics, opens resource nodes `资源监控` to verify CPU/Memory/Disk curves and provider upgrade precheck dispatch, opens `Docker 状态`, switches to `main-server`, and verifies the Docker Desktop-style container view including the database named volume `unidesk_pgdata_10gb`, opens `网关版本` and verifies the provider-gateway version, SSH 透传可用性、远程更新可用性 plus structured automatic update records for `provider.upgrade`, then opens `微服务 / 服务目录``微服务 / Todo Note``微服务 / FindJob``微服务 / Pipeline` and `微服务 / MET Nonlinear` to verify 主 server Todo Note、D601、仓库引用、私有后端映射、Todo Note 迁移清单和树形任务、FindJob 指标和岗位预览、Pipeline 组件矩阵、React Flow 控制图和最近运行、MET Nonlinear 项目库/Fork/待启动队列/当前队列/已完成/失败诊断/GPU/镜像都通过 React 控件展示。
- Microservice frontend assertions must wait for real backend data, not only the page skeleton. For Todo Note this means the page must show the migrated lists `CONSTAR``大论文``找工作``小论文``事务`, support creating a temporary list and task through the frontend, and delete that temporary list afterwards. The temporary list must be selected again by its unique generated name before deletion so E2E never deletes a migrated source list by accident. For FindJob this means the page must show a numeric `岗位总量`, `HEALTH OK`, and a non-empty `PREVIEW` count such as `40/1463 PREVIEW`; for Pipeline this means the page must show `Pipeline v2 工作台`, `Health OK`, a numeric component count, a non-empty React Flow control graph, `控制图`, and `最近运行`; for MET Nonlinear this means the page must show `MET Nonlinear 训练编排`, `Health OK`, `Fork Project`, `加入待启动队列`, `启动队列`, `当前队列`, 最大并发设置、task queue and GPU/image panels, and must not show the removed hard-coded `创建10个10轮任务` frontend entry; loading placeholders like `--` or empty states are not sufficient for E2E success.
- Frontend: Playwright must open the public frontend URL derived from `network.publicHost`, not localhost or a Docker-internal URL; it logs in with the configured account, waits for `核心在线`, asserts that `main-server` and `Main Server Provider` are visible, verifies desktop sidebar collapse and `PGDATA` overview metric, clicks `查看原始JSON` to verify Provider data from the frontend, confirms no raw JSON is visible before that click, opens task history to verify duration and failure diagnostics, opens resource nodes `资源监控` to verify CPU/Memory/Disk curves and provider upgrade precheck dispatch, opens `Docker 状态`, switches to `main-server`, and verifies the Docker Desktop-style container view including the database named volume `unidesk_pgdata_10gb`, opens `网关版本` and verifies the provider-gateway version, SSH 透传可用性、远程更新可用性 plus structured automatic update records for `provider.upgrade`, then opens `微服务 / 服务目录``微服务 / Todo Note``微服务 / FindJob``微服务 / Pipeline` and `微服务 / MET Nonlinear` to verify 主 server Todo Note、D601、仓库引用、私有后端映射、Todo Note 迁移清单和树形任务、FindJob 指标和岗位预览、Pipeline 组件矩阵、React Flow 控制图和最近运行、MET Nonlinear 项目库/Fork/待启动队列/当前队列/已完成/失败诊断/GPU/镜像都通过 React 控件展示。Task history and provider upgrade records must not display a real sub-second duration as `0s`; MET Nonlinear running rows must show an ETA derived from backend progress or from `startedAt` plus epoch progress.
- Microservice frontend assertions must wait for real backend data, not only the page skeleton. For Todo Note this means the page must show the migrated lists `CONSTAR``大论文``找工作``小论文``事务`, support creating a temporary list and task through the frontend, and delete that temporary list afterwards. The temporary list must be selected again by its unique generated name before deletion so E2E never deletes a migrated source list by accident. For FindJob this means the page must show a numeric `岗位总量`, `HEALTH OK`, and a non-empty `PREVIEW` count such as `40/1463 PREVIEW`; for Pipeline this means the page must show `Pipeline v2 工作台`, `Health OK`, a numeric component count, a non-empty React Flow control graph, `控制图`, and `最近运行`; for MET Nonlinear this means the page must show `MET Nonlinear 训练编排`, `Health OK`, `Fork Project`, `加入待启动队列`, `启动队列`, `当前队列`, 最大并发设置、task queue and GPU/image panels, and must not show the removed hard-coded `创建10个10轮任务` frontend entry. Full MET Nonlinear acceptance is driven by public frontend controls: choose a visible source Project, set batch size, epochs and max concurrency in inputs, fork into `projects/unidesk_forks/`, stage the selected forks, start the queue, and verify completed rows plus automatic `metnl-train-*` container removal; loading placeholders like `--` or empty states are not sufficient for E2E success.
## Frontend JSON Rule
+3 -3
View File
@@ -18,7 +18,7 @@ frontend 应用源码必须使用 TypeScript + React,禁止在 `src/components
## Task History Diagnostics
`任务调度 / 任务历史` 必须把任务生命周期渲染为可诊断表格,不得只显示更新时间和原始 payload 摘要。每行至少展示状态、任务命令和 id、Provider、任务耗时、载荷摘要、诊断信息、更新时间和显式 `查看原始JSON` 操作;终态任务的耗时按 `updatedAt - createdAt` 计算,待处理任务按当前时间减 `createdAt` 计算。失败任务必须在默认视图中提取 `result.error``result.message``result.stderr``result.reason` 或等价字段作为失败原因,并将 exit code、timeout、previous status 等关键诊断字段渲染为控件;完整 result 只能通过 `查看原始JSON` 展开。
`任务调度 / 任务历史` 必须把任务生命周期渲染为可诊断表格,不得只显示更新时间和原始 payload 摘要。每行至少展示状态、任务命令和 id、Provider、任务耗时、载荷摘要、诊断信息、更新时间和显式 `查看原始JSON` 操作;终态任务的耗时按 `updatedAt - createdAt` 计算,待处理任务按当前时间减 `createdAt` 计算。耗时必须保留毫秒到秒的精度,小于 1 秒的任务显示小数秒或 `<0.01s`,不得把真实的亚秒级任务四舍五入或向下取整成 `0s`失败任务必须在默认视图中提取 `result.error``result.message``result.stderr``result.reason` 或等价字段作为失败原因,并将 exit code、timeout、previous status 等关键诊断字段渲染为控件;完整 result 只能通过 `查看原始JSON` 展开。
## Resource Node Monitor View
@@ -30,7 +30,7 @@ frontend 应用源码必须使用 TypeScript + React,禁止在 `src/components
## Provider Gateway Version View
资源节点模块必须提供 `网关版本` 子标签,按每个 Provider 展示 provider-gateway 版本号、升级策略、启动时间、能力摘要、SSH 透传可用性、远程更新可用性、最近自动更新状态和自动更新记录。SSH 透传可用性必须由 `unideskCapabilities` 是否包含 `host.ssh``hostSshConfigured``hostSshKeyPresent``hostSshTarget` 渲染为结构化徽标;远程更新可用性必须由 `unideskCapabilities` 是否包含 `provider.upgrade``providerGatewayUpgradePolicy: "always-enabled"` 渲染为结构化徽标。自动更新记录的数据源是 `provider.upgrade` 任务历史,默认必须渲染为结构化表格字段:状态、模式、任务 id、来源、耗时、策略、结果摘要和更新时间;不得把升级 plan、task result 或服务日志作为裸 JSON 直接铺在页面上。`最近自动更新` 应优先选择最新 `mode: "schedule"` 的真实升级记录,避免后续预检 plan 覆盖真正的升级结果;完整升级任务 JSON 只能通过对应行的 `查看原始JSON` 按钮显式打开。
资源节点模块必须提供 `网关版本` 子标签,按每个 Provider 展示 provider-gateway 版本号、升级策略、启动时间、能力摘要、SSH 透传可用性、远程更新可用性、最近自动更新状态和自动更新记录。SSH 透传可用性必须由 `unideskCapabilities` 是否包含 `host.ssh``hostSshConfigured``hostSshKeyPresent``hostSshTarget` 渲染为结构化徽标;远程更新可用性必须由 `unideskCapabilities` 是否包含 `provider.upgrade``providerGatewayUpgradePolicy: "always-enabled"` 渲染为结构化徽标。自动更新记录的数据源是 `provider.upgrade` 任务历史,默认必须渲染为结构化表格字段:状态、模式、任务 id、来源、耗时、策略、结果摘要和更新时间;亚秒级升级耗时必须显示小数秒,不得显示成 `0s`不得把升级 plan、task result 或服务日志作为裸 JSON 直接铺在页面上。`最近自动更新` 应优先选择最新 `mode: "schedule"` 的真实升级记录,避免后续预检 plan 覆盖真正的升级结果;完整升级任务 JSON 只能通过对应行的 `查看原始JSON` 按钮显式打开。
## Provider Operation Availability
@@ -42,7 +42,7 @@ frontend 应用源码必须使用 TypeScript + React,禁止在 `src/components
## Microservice Frontend
`微服务` 主模块用于展示挂载在计算节点或主 server Docker 中的业务后端。`服务目录` 必须显示 service id、Provider、仓库 URL、commit id、业务 Dockerfile/docker-compose 引用、节点后端私有映射、SSH 透传开发入口和运行态容器摘要;`Todo Note` 子标签必须把主 server `todo-note-backend` 后端渲染为 UniDesk React 控件,包括迁移清单、树形任务、筛选、提醒、拖放/移动、撤销/重做、字号控制和显式原始 JSON 按钮;`FindJob` 子标签必须把 D601 findjob 后端渲染为 UniDesk React 控件,包括岗位指标、岗位预览、草稿报告和显式原始 JSON 按钮;`Pipeline` 子标签必须把 D601 `/home/ubuntu/pipeline` 的 snapshot 后端渲染为组件矩阵、React Flow 控制图框图、最近运行卡片和证据日志摘要;`MET Nonlinear` 子标签必须把 D601 `/home/ubuntu/met_nonlinear` 的训练编排后端渲染为下载器式工作台,包括项目库选择、从已有 Project fork 新 Project、加入待启动队列、启动队列、最大并发设置、当前队列、已完成、失败诊断、GPU/镜像、训练进度、ETA、历史记录和显式原始 JSON 按钮;不得提供硬编码的固定数量/固定轮数测试按钮。该模块不得 iframe 业务旧前端、Todo Note 原 Vite 前端或 Pipeline 自身 WebUI,不得把 microservice 后端端口暴露为浏览器直连 URL,也不得把业务 API 的 JSON 裸铺在页面上。
`微服务` 主模块用于展示挂载在计算节点或主 server Docker 中的业务后端。`服务目录` 必须显示 service id、Provider、仓库 URL、commit id、业务 Dockerfile/docker-compose 引用、节点后端私有映射、SSH 透传开发入口和运行态容器摘要;`Todo Note` 子标签必须把主 server `todo-note-backend` 后端渲染为 UniDesk React 控件,包括迁移清单、树形任务、筛选、提醒、拖放/移动、撤销/重做、字号控制和显式原始 JSON 按钮;`FindJob` 子标签必须把 D601 findjob 后端渲染为 UniDesk React 控件,包括岗位指标、岗位预览、草稿报告和显式原始 JSON 按钮;`Pipeline` 子标签必须把 D601 `/home/ubuntu/pipeline` 的 snapshot 后端渲染为组件矩阵、React Flow 控制图框图、最近运行卡片和证据日志摘要;`MET Nonlinear` 子标签必须把 D601 `/home/ubuntu/met_nonlinear` 的训练编排后端渲染为下载器式工作台,包括项目库选择、从已有 Project fork 新 Project、加入待启动队列、启动队列、最大并发设置、当前队列、已完成、失败诊断、GPU/镜像、训练进度、ETA、历史记录和显式原始 JSON 按钮;运行中训练若后端未直接给出 ETA,前端必须用 `startedAt`、当前 epoch 和目标 epoch 做可解释的剩余时间估算;不得提供硬编码的固定数量/固定轮数测试按钮。该模块不得 iframe 业务旧前端、Todo Note 原 Vite 前端或 Pipeline 自身 WebUI,不得把 microservice 后端端口暴露为浏览器直连 URL,也不得把业务 API 的 JSON 裸铺在页面上。
## Component Data Rendering
+1 -1
View File
@@ -99,7 +99,7 @@ Pipeline 在 UniDesk 语境中按观测后端服务管理:默认页面不得 i
MET Nonlinear 的长期服务边界写在业务仓库 `~/met_nonlinear/docs/reference/unidesk_microservice.md``met-nonlinear-ts` 是长驻 Bun TypeScript 编排后端,`met-nonlinear-ml:tf26` 是按需训练镜像,每个训练任务用一个 `docker run --rm` 容器执行 `python cli.py -t <projectPath>`,训练完成后容器自动销毁。训练镜像 Dockerfile 必须使用中国大陆可达的软件源;当前固定使用 Huawei Cloud mirror 的 `nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04`、Aliyun apt mirror、Tsinghua PyPI mirror、Ubuntu Python 3.8 和 `tensorflow==2.6.0`,避免官方 TensorFlow 2.6 GPU 镜像 Python 3.6 与业务源码类型注解不兼容。
MET Nonlinear 验收必须通过公网 UniDesk frontend 的交互式 UI 完成:选择已有 source Project,设置训练轮数和最大并发,使用 `Fork Project` 创建新的 `projects/unidesk_forks/` Project,确认新 Project 只是被选中而不会直接训练,再加入待启动队列并点击 `启动队列`。验收时必须确认待启动、排队中、训练中、已完成和失败诊断分标签可见,最大并发按 UI 设置生效,目标 GPU 为 2080Ti2080Ti 显存余量低于 20% 时自动限制并发,并确认训练容器结束后不残留。CLI `/api/queue/server-test` 仅保留为后端兼容入口,不作为 frontend 操作入口。
MET Nonlinear 验收必须通过公网 UniDesk frontend 的交互式 UI 完成:选择已有 source Project,设置训练轮数和最大并发,使用 `Fork Project` 创建新的 `projects/unidesk_forks/` Project,确认新 Project 只是被选中而不会直接训练,再加入待启动队列并点击 `启动队列`。验收时必须确认待启动、排队中、训练中、已完成和失败诊断分标签可见,最大并发按 UI 设置生效,运行中行显示训练进度和 ETA目标 GPU 为 2080Ti2080Ti 显存余量低于 20% 时自动限制并发,并确认训练容器结束后不残留。批量规模由 UI 输入框决定,完整验收可以通过输入 `Fork 数量=10``训练轮数=200``最大并发=3` 执行,但不得把该规模做成专用硬编码按钮。CLI `/api/queue/server-test` 仅保留为后端兼容入口,不作为 frontend 操作入口。
## CLI