fix: harden provider gateway upgrades
This commit is contained in:
@@ -4,7 +4,9 @@ UniDesk 是一个以主 server 为统一入口的分布式工作平台;本文
|
||||
|
||||
## Critical Provider Gateway Upgrade Rule
|
||||
|
||||
- 计算节点 `provider-gateway` 容器的重建/升级必须走 `provider.upgrade mode=schedule` 远程升级路径或前端等价调度;禁止通过 `bun scripts/cli.ts ssh <providerId>` 同步执行 `docker compose up --build provider-gateway` 这类自重建命令,权威规则见 `docs/reference/provider-gateway.md`。
|
||||
- `src/components/provider-gateway` 有任何代码或行为变更时,必须在同一变更集中递增 `src/components/provider-gateway/package.json` 的版本号,并在升级后通过 frontend 或 `debug health` 确认目标节点上报新版本;权威规则见 `docs/reference/provider-gateway.md`。
|
||||
- `provider.upgrade` 预检、执行升级和自动更新记录必须显式显示指定 Provider 的 gateway 版本号,不能只把版本放进原始 JSON;前端和 E2E 要求见 `docs/reference/provider-gateway.md` 与 `TEST.md`。
|
||||
- 计算节点 `provider-gateway` 容器的重建/升级必须走带 sleep-and-validate 回滚保护的 `provider.upgrade mode=schedule` 远程升级路径或前端等价调度;禁止通过 `bun scripts/cli.ts ssh <providerId>` 同步执行 `docker compose up --build provider-gateway` 这类自重建命令,权威规则见 `docs/reference/provider-gateway.md`。
|
||||
- Host SSH / WSL SSH 透传只能用于节点诊断、前置条件修复和升级后验证,不能作为计算节点 `provider-gateway` 自身的重建/升级通道;部署验收必须同时证明远程升级和 SSH 透传可用,测试门禁见 `TEST.md`。
|
||||
|
||||
## CLI
|
||||
@@ -29,7 +31,7 @@ UniDesk 是一个以主 server 为统一入口的分布式工作平台;本文
|
||||
- `bun`:TypeScript 运行时固定使用 Bun,组件入口和 CLI 都直接运行 `.ts` 文件,约束见 `docs/reference/config.md`。
|
||||
- `docker-compose.yml`:主 server 统一编排 core、frontend、database、本机 provider gateway 和 Todo Note 后端,且只公开 frontend/provider ingress,服务拓扑见 `docs/reference/deployment.md`。
|
||||
- `src/components/frontend`:前端源码固定使用 TypeScript + React,`app.tsx` 只做 shell/router,Todo Note、FindJob、Pipeline、MET Nonlinear 等业务页必须拆到独立 TSX 模块,并采用高信息密度工业控制台设计,界面规则见 `docs/reference/frontend.md`。
|
||||
- `src/components/provider-gateway`:当前主 server `74.48.78.17` 也作为 provider gateway 接入 UniDesk,外部节点通过 `ws://74.48.78.17:18082/ws/provider` 接入,必须同时部署 always-enabled 远程升级和 Host SSH / WSL SSH 透传并完成自测,部署与 Playwright 公网前端验证方法见 `docs/reference/provider-gateway.md`。
|
||||
- `src/components/provider-gateway`:当前主 server `74.48.78.17` 也作为 provider gateway 接入 UniDesk,外部节点通过 `ws://74.48.78.17:18082/ws/provider` 接入,必须以 `restart: always` 部署 always-enabled 远程升级、sleep-and-validate 回滚保护和 Host SSH / WSL SSH 透传并完成自测,部署与 Playwright 公网前端验证方法见 `docs/reference/provider-gateway.md`。
|
||||
- `microservices`:主 server 本地开发边界固定为只开发 UniDesk frontend;非 UniDesk 核心业务后端、Dockerfile、GPU/训练调试必须在目标计算节点通过 SSH 透传完成,Todo Note 这类明确写入主 server 的例外需单独登记,规则见 `docs/reference/microservices.md`。
|
||||
- `docs/reference/e2e.md`:交付前必须执行的自测门禁、Playwright 登录与 JSON 展示断言、数据库命名卷持久化要求。
|
||||
|
||||
|
||||
@@ -58,7 +58,7 @@
|
||||
|
||||
## T14 Provider Gateway 远程升级
|
||||
|
||||
阅读 `AGENTS.md`(本项目 `AGENTS.md` 同时承担 `SKILL.md` 对 `scripts/cli.ts` 的解释职责),然后用 cli 手动测试以下内容:运行 `bun scripts/cli.ts debug dispatch main-server provider.upgrade`,随后查看任务历史或 `bun scripts/cli.ts debug health`,确认 `provider.upgrade` 通过真实 WebSocket 下发并以 `mode: plan` 成功返回升级计划且计划中包含 `policy: "always-enabled"`、`--no-deps` 和 `--force-recreate`;对明确要升级或重建 `provider-gateway` 容器的计算节点,必须再运行 `bun scripts/cli.ts debug dispatch <PROVIDER_ID> provider.upgrade --mode schedule --wait-ms 15000`,确认任务成功、result 包含 updater 容器信息、节点随后重新上线。在非主 server 的计算节点上,必须使用 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> provider.upgrade --mode schedule --wait-ms 15000` 做同一验证,证明该节点能通过公网 frontend remote CLI 自测自动升级,且不需要指定 `--main-server-key`。正式执行计算节点 `provider-gateway` 重建/升级只能通过前端 `资源监控` 的 `执行升级` 或等价的 `provider.upgrade mode=schedule` 显式调度完成,不能通过 `bun scripts/cli.ts ssh <PROVIDER_ID>` 或 Host SSH 维护桥同步执行自重建命令,也不能通过 `PROVIDER_UPGRADE_ENABLED` 或等价开关禁用远程升级。
|
||||
阅读 `AGENTS.md`(本项目 `AGENTS.md` 同时承担 `SKILL.md` 对 `scripts/cli.ts` 的解释职责),然后用 cli 手动测试以下内容:如果本次变更修改了 `src/components/provider-gateway` 代码或行为,先确认 `src/components/provider-gateway/package.json` 的 `version` 已递增;运行 `bun scripts/cli.ts debug dispatch main-server provider.upgrade`,随后查看任务历史或 `bun scripts/cli.ts debug health`,确认 `provider.upgrade` 通过真实 WebSocket 下发并以 `mode: plan` 成功返回升级计划且计划中包含 `providerId`、`providerName`、`providerGatewayVersion`、`targetProviderGatewayVersion`、`policy: "always-enabled"`、`--no-deps`、`--force-recreate`、`oldGatewaySleepMs`、`promoteOnlyAfterCandidateValidation`、`candidateRestartPolicyAfterPromotion: "always"` 和 `candidateUsesOldContainerEnvironment`;对明确要升级或重建 `provider-gateway` 容器的计算节点,必须再运行 `bun scripts/cli.ts debug dispatch <PROVIDER_ID> provider.upgrade --mode schedule --wait-ms 15000`,确认任务成功、result 包含 updater 容器信息、候选 gateway 验证后节点重新上线,`providerGatewayVersion` 已上报目标新版本,且最终 provider-gateway 容器 Docker restart policy 是 `always`。在非主 server 的计算节点上,必须使用 `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> provider.upgrade --mode schedule --wait-ms 15000` 做同一验证,证明该节点能通过公网 frontend remote CLI 自测自动升级,且不需要指定 `--main-server-key`。正式执行计算节点 `provider-gateway` 重建/升级只能通过前端 `资源监控` 的 `执行升级` 或等价的 `provider.upgrade mode=schedule` 显式调度完成,不能通过 `bun scripts/cli.ts ssh <PROVIDER_ID>` 或 Host SSH 维护桥同步执行自重建命令,也不能通过 `PROVIDER_UPGRADE_ENABLED` 或等价开关禁用远程升级。
|
||||
|
||||
## T15 待处理任务可追溯
|
||||
|
||||
@@ -74,7 +74,7 @@
|
||||
|
||||
## T18 Provider Gateway 版本与自动更新记录
|
||||
|
||||
阅读 `AGENTS.md`(本项目 `AGENTS.md` 同时承担 `SKILL.md` 对 `scripts/cli.ts` 的解释职责),然后用 cli 手动测试以下内容:运行 `bun scripts/cli.ts e2e run`,确认 `provider:gateway-version-label`、`frontend:gateway-version-records-visible` 和 `frontend:provider-operation-availability-visible` passed;再登录公网 frontend,进入 `资源节点 / 网关版本`,确认每个 Provider 行都显示 provider-gateway 版本号、升级策略、SSH 透传可用性、远程更新可用性、能力摘要、最近自动更新记录,并在下方以表格记录 `provider.upgrade` 的状态、模式、任务 id、来源、耗时、策略、结果摘要和更新时间。自动更新记录默认必须是结构化控件,不得展示裸 JSON;完整 task/result 只能通过 `查看原始JSON` 按钮查看。
|
||||
阅读 `AGENTS.md`(本项目 `AGENTS.md` 同时承担 `SKILL.md` 对 `scripts/cli.ts` 的解释职责),然后用 cli 手动测试以下内容:运行 `bun scripts/cli.ts e2e run`,确认 `provider:gateway-version-label`、`frontend:gateway-version-records-visible` 和 `frontend:provider-operation-availability-visible` passed;再登录公网 frontend,进入 `资源节点 / 网关版本`,确认每个 Provider 行都显示 provider-gateway 版本号、升级策略、SSH 透传可用性、远程更新可用性、能力摘要、最近自动更新记录,并在下方以表格记录 `provider.upgrade` 的状态、模式、任务 id、来源、耗时、策略、指定 Provider 的 gateway 版本号、结果摘要和更新时间。自动更新记录默认必须是结构化控件,不得展示裸 JSON;完整 task/result 只能通过 `查看原始JSON` 按钮查看。
|
||||
|
||||
## T19 前端单服务重建
|
||||
|
||||
@@ -95,4 +95,4 @@
|
||||
|
||||
## T23 MET Nonlinear D601 GPU Microservice
|
||||
|
||||
阅读 `AGENTS.md`(本项目 `AGENTS.md` 同时承担 `SKILL.md` 对 `scripts/cli.ts` 的解释职责),然后用 cli 手动测试以下内容:确认 D601 `~/met_nonlinear` 中存在 `docker-compose.unidesk.yml`、`docker/unidesk/Dockerfile.ml`、`unidesk/server/src/index.ts` 和 `docs/reference/unidesk_microservice.md`;运行 `bun scripts/cli.ts microservice list`,确认 `met-nonlinear` 显示为 `providerId=D601`、`public=false`、`frontendOnly=true`、`127.0.0.1:3288` 后端映射和 `met-nonlinear-ts` 容器摘要;运行 `bun scripts/cli.ts microservice health met-nonlinear`、`bun scripts/cli.ts microservice proxy met-nonlinear /api/queue`、`bun scripts/cli.ts microservice proxy met-nonlinear '/api/projects?root=projects&limit=20'` 和 `bun scripts/cli.ts microservice proxy met-nonlinear /api/images`,确认链路通过 backend-core、D601 provider-gateway 和 D601 本机 TS 后端;在 D601 通过 SSH 透传执行 `curl -fsS -X POST http://127.0.0.1:3288/api/queue/server-test -H 'content-type: application/json' -d '{"sourceProject":"projects/FRIKANh6u6l4","count":10,"epochs":10,"maxConcurrency":2}'`,随后轮询 `/api/queue`,确认最多 2 个训练容器并行、目标 GPU 是 2080Ti、显存余量低于 20% 时自动限制并发、10 个 `projects/server_test/` 任务最终成功且训练容器自动销毁。最后登录公网 frontend `http://74.48.78.17:18081/`,进入 `微服务 / MET Nonlinear`,确认页面以 React 控件显示队列、GPU/镜像、Project config 预览、训练进度、ETA 和历史记录,默认没有裸 JSON,只有点击 `查看原始JSON` 才显示原始数据。
|
||||
阅读 `AGENTS.md`(本项目 `AGENTS.md` 同时承担 `SKILL.md` 对 `scripts/cli.ts` 的解释职责),然后用 cli 手动测试以下内容:确认 D601 `~/met_nonlinear` 中存在 `docker-compose.unidesk.yml`、`docker/unidesk/Dockerfile.ml`、`unidesk/server/src/index.ts` 和 `docs/reference/unidesk_microservice.md`;运行 `bun scripts/cli.ts microservice list`,确认 `met-nonlinear` 显示为 `providerId=D601`、`public=false`、`frontendOnly=true`、`127.0.0.1:3288` 后端映射和 `met-nonlinear-ts` 容器摘要;运行 `bun scripts/cli.ts microservice health met-nonlinear`、`bun scripts/cli.ts microservice proxy met-nonlinear /api/queue`、`bun scripts/cli.ts microservice proxy met-nonlinear '/api/projects?root=projects&limit=20'` 和 `bun scripts/cli.ts microservice proxy met-nonlinear /api/images`,确认链路通过 backend-core、D601 provider-gateway 和 D601 本机 TS 后端;最后登录公网 frontend `http://74.48.78.17:18081/`,进入 `微服务 / MET Nonlinear`,通过 UI 选择已有 source Project,设置训练轮数和最大并发,使用 `Fork Project` 创建新的 `projects/unidesk_forks/` Project,确认新 Project 被自动勾选但不会直接训练,再点击 `加入待启动队列` 和 `启动队列`;确认最多按 UI 设置的并发数运行、目标 GPU 是 2080Ti、显存余量低于 20% 时自动限制并发、任务最终进入已完成或失败诊断标签且训练容器自动销毁。页面必须以 React 控件显示项目库、待启动/排队/训练中、已完成、失败诊断、GPU/镜像、训练进度、ETA 和历史记录,默认没有裸 JSON,只有点击 `查看原始JSON` 才显示原始数据;前端不得再提供 `创建10个10轮任务` 这类硬编码测试按钮。
|
||||
|
||||
+1
-1
@@ -125,7 +125,7 @@ services:
|
||||
context: .
|
||||
dockerfile: src/components/provider-gateway/Dockerfile
|
||||
container_name: unidesk-provider-gateway-main
|
||||
restart: unless-stopped
|
||||
restart: always
|
||||
depends_on:
|
||||
- backend-core
|
||||
environment:
|
||||
|
||||
@@ -18,8 +18,8 @@ UniDesk delivery is not complete until the public frontend, public provider ingr
|
||||
- Provider remote control: internal `/api/dispatch` must successfully complete a real `provider.upgrade` task in `mode: "plan"` so the upgrade path is validated without recreating the running gateway during E2E.
|
||||
- Microservices: internal `/api/microservices` must include `todo-note` on `main-server` plus `findjob`, `pipeline` and `met-nonlinear` on `D601` with `public=false`; `/api/microservices/todo-note/health` must report `storage=postgres`, `/api/microservices/todo-note/proxy/api/instances` must expose the migrated Todo Note lists, and a temporary Todo Note list create/add/toggle/undo/delete cycle must succeed through the real provider-gateway proxy; `/api/microservices/findjob/health` and `/api/microservices/findjob/proxy/api/summary` must succeed through the real provider-gateway proxy; `/api/microservices/findjob/proxy/api/jobs?__unideskArrayLimit=jobs:5` must return a bounded preview with `_unidesk.arrayLimits` metadata; `/api/microservices/pipeline/health` and `/api/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3` must return Pipeline health, registry and run previews; `/api/microservices/met-nonlinear/health`, `/api/microservices/met-nonlinear/proxy/api/queue`, `/api/microservices/met-nonlinear/proxy/api/projects?root=projects&limit=20` and `/api/microservices/met-nonlinear/proxy/api/images` must return the D601 TS backend health, queue/GPU policy, project preview and ready `met-nonlinear-ml:tf26` image status.
|
||||
- Database: the command writes an `unidesk_e2e_markers` row through `docker exec unidesk-database psql`, confirms provider state is stored in PostgreSQL, and checks Todo Note rows exist in `todo_note_instances` using the same named volume.
|
||||
- Frontend: Playwright must open the public frontend URL derived from `network.publicHost`, not localhost or a Docker-internal URL; it logs in with the configured account, waits for `核心在线`, asserts that `main-server` and `Main Server Provider` are visible, verifies desktop sidebar collapse and `PGDATA` overview metric, clicks `查看原始JSON` to verify Provider data from the frontend, confirms no raw JSON is visible before that click, opens task history to verify duration and failure diagnostics, opens resource nodes `资源监控` to verify CPU/Memory/Disk curves and provider upgrade precheck dispatch, opens `Docker 状态`, switches to `main-server`, and verifies the Docker Desktop-style container view including the database named volume `unidesk_pgdata_10gb`, opens `网关版本` and verifies the provider-gateway version, SSH 透传可用性、远程更新可用性 plus structured automatic update records for `provider.upgrade`, then opens `微服务 / 服务目录`、`微服务 / Todo Note`、`微服务 / FindJob`、`微服务 / Pipeline` and `微服务 / MET Nonlinear` to verify 主 server Todo Note、D601、仓库引用、私有后端映射、Todo Note 迁移清单和树形任务、FindJob 指标和岗位预览、Pipeline 组件矩阵、React Flow 控制图和最近运行、MET Nonlinear 队列/GPU/镜像/Project config/训练历史都通过 React 控件展示。
|
||||
- Microservice frontend assertions must wait for real backend data, not only the page skeleton. For Todo Note this means the page must show the migrated lists `CONSTAR`、`大论文`、`找工作`、`小论文`、`事务`, support creating a temporary list and task through the frontend, and delete that temporary list afterwards. The temporary list must be selected again by its unique generated name before deletion so E2E never deletes a migrated source list by accident. For FindJob this means the page must show a numeric `岗位总量`, `HEALTH OK`, and a non-empty `PREVIEW` count such as `40/1463 PREVIEW`; for Pipeline this means the page must show `Pipeline v2 工作台`, `Health OK`, a numeric component count, a non-empty React Flow control graph, `控制图`, and `最近运行`; for MET Nonlinear this means the page must show `MET Nonlinear 训练编排`, `Health OK`, `创建10个10轮任务`, task queue and GPU/image panels; loading placeholders like `--` or empty states are not sufficient for E2E success.
|
||||
- Frontend: Playwright must open the public frontend URL derived from `network.publicHost`, not localhost or a Docker-internal URL; it logs in with the configured account, waits for `核心在线`, asserts that `main-server` and `Main Server Provider` are visible, verifies desktop sidebar collapse and `PGDATA` overview metric, clicks `查看原始JSON` to verify Provider data from the frontend, confirms no raw JSON is visible before that click, opens task history to verify duration and failure diagnostics, opens resource nodes `资源监控` to verify CPU/Memory/Disk curves and provider upgrade precheck dispatch, opens `Docker 状态`, switches to `main-server`, and verifies the Docker Desktop-style container view including the database named volume `unidesk_pgdata_10gb`, opens `网关版本` and verifies the provider-gateway version, SSH 透传可用性、远程更新可用性 plus structured automatic update records for `provider.upgrade`, then opens `微服务 / 服务目录`、`微服务 / Todo Note`、`微服务 / FindJob`、`微服务 / Pipeline` and `微服务 / MET Nonlinear` to verify 主 server Todo Note、D601、仓库引用、私有后端映射、Todo Note 迁移清单和树形任务、FindJob 指标和岗位预览、Pipeline 组件矩阵、React Flow 控制图和最近运行、MET Nonlinear 项目库/Fork/待启动队列/当前队列/已完成/失败诊断/GPU/镜像都通过 React 控件展示。
|
||||
- Microservice frontend assertions must wait for real backend data, not only the page skeleton. For Todo Note this means the page must show the migrated lists `CONSTAR`、`大论文`、`找工作`、`小论文`、`事务`, support creating a temporary list and task through the frontend, and delete that temporary list afterwards. The temporary list must be selected again by its unique generated name before deletion so E2E never deletes a migrated source list by accident. For FindJob this means the page must show a numeric `岗位总量`, `HEALTH OK`, and a non-empty `PREVIEW` count such as `40/1463 PREVIEW`; for Pipeline this means the page must show `Pipeline v2 工作台`, `Health OK`, a numeric component count, a non-empty React Flow control graph, `控制图`, and `最近运行`; for MET Nonlinear this means the page must show `MET Nonlinear 训练编排`, `Health OK`, `Fork Project`, `加入待启动队列`, `启动队列`, `当前队列`, 最大并发设置、task queue and GPU/image panels, and must not show the removed hard-coded `创建10个10轮任务` frontend entry; loading placeholders like `--` or empty states are not sufficient for E2E success.
|
||||
|
||||
## Frontend JSON Rule
|
||||
|
||||
@@ -51,6 +51,6 @@ Before claiming delivery, run these checks and keep their JSON output or screens
|
||||
|
||||
## Provider Upgrade Gate
|
||||
|
||||
When delivery explicitly includes upgrading or rebuilding a compute-node `provider-gateway` such as D601 or D518, the automated E2E plan check is not sufficient. The operator must first bootstrap any legacy provider only from a node-local terminal, node-owned web terminal, systemd, scheduled task, or detached shell if it cannot yet schedule upgrades; SSH passthrough carried by the same provider-gateway must not be used for synchronous self-rebuilds. Then run `provider.upgrade` with `mode: "schedule"` against that Provider ID, confirm the task succeeds, confirm the node reconnects in the public frontend, and finally verify any required `host.ssh` capability with `bun scripts/cli.ts ssh <PROVIDER_ID> hostname`. This schedule check is a node-upgrade gate, not a replacement for the standard public frontend Playwright E2E gate.
|
||||
When delivery explicitly includes upgrading or rebuilding a compute-node `provider-gateway` such as D601 or D518, the automated E2E plan check is not sufficient. The operator must first bootstrap any legacy provider only from a node-local terminal, node-owned web terminal, systemd, scheduled task, or detached shell if it cannot yet schedule upgrades; SSH passthrough carried by the same provider-gateway must not be used for synchronous self-rebuilds. Then run `provider.upgrade` with `mode: "schedule"` against that Provider ID, confirm the task succeeds, confirm the sleep-and-validate candidate gateway reconnects in the public frontend, confirm the final container restart policy is `always`, and finally verify any required `host.ssh` capability with `bun scripts/cli.ts ssh <PROVIDER_ID> hostname`. This schedule check is a node-upgrade gate, not a replacement for the standard public frontend Playwright E2E gate.
|
||||
|
||||
External compute nodes should run that schedule check through the remote main-server passthrough form: `bun scripts/cli.ts --main-server-ip 74.48.78.17 debug dispatch <PROVIDER_ID> provider.upgrade --mode schedule --wait-ms 15000`. The default remote transport logs in to the public frontend and does not require a main server SSH key; this proves the node can validate itself without direct access to backend-core REST or PostgreSQL.
|
||||
|
||||
@@ -42,7 +42,7 @@ frontend 应用源码必须使用 TypeScript + React,禁止在 `src/components
|
||||
|
||||
## Microservice Frontend
|
||||
|
||||
`微服务` 主模块用于展示挂载在计算节点或主 server Docker 中的业务后端。`服务目录` 必须显示 service id、Provider、仓库 URL、commit id、业务 Dockerfile/docker-compose 引用、节点后端私有映射、SSH 透传开发入口和运行态容器摘要;`Todo Note` 子标签必须把主 server `todo-note-backend` 后端渲染为 UniDesk React 控件,包括迁移清单、树形任务、筛选、提醒、拖放/移动、撤销/重做、字号控制和显式原始 JSON 按钮;`FindJob` 子标签必须把 D601 findjob 后端渲染为 UniDesk React 控件,包括岗位指标、岗位预览、草稿报告和显式原始 JSON 按钮;`Pipeline` 子标签必须把 D601 `/home/ubuntu/pipeline` 的 snapshot 后端渲染为组件矩阵、React Flow 控制图框图、最近运行卡片和证据日志摘要;`MET Nonlinear` 子标签必须把 D601 `/home/ubuntu/met_nonlinear` 的训练编排后端渲染为队列、GPU/镜像、Project config 预览、训练进度、ETA、历史记录和显式原始 JSON 按钮。该模块不得 iframe 业务旧前端、Todo Note 原 Vite 前端或 Pipeline 自身 WebUI,不得把 microservice 后端端口暴露为浏览器直连 URL,也不得把业务 API 的 JSON 裸铺在页面上。
|
||||
`微服务` 主模块用于展示挂载在计算节点或主 server Docker 中的业务后端。`服务目录` 必须显示 service id、Provider、仓库 URL、commit id、业务 Dockerfile/docker-compose 引用、节点后端私有映射、SSH 透传开发入口和运行态容器摘要;`Todo Note` 子标签必须把主 server `todo-note-backend` 后端渲染为 UniDesk React 控件,包括迁移清单、树形任务、筛选、提醒、拖放/移动、撤销/重做、字号控制和显式原始 JSON 按钮;`FindJob` 子标签必须把 D601 findjob 后端渲染为 UniDesk React 控件,包括岗位指标、岗位预览、草稿报告和显式原始 JSON 按钮;`Pipeline` 子标签必须把 D601 `/home/ubuntu/pipeline` 的 snapshot 后端渲染为组件矩阵、React Flow 控制图框图、最近运行卡片和证据日志摘要;`MET Nonlinear` 子标签必须把 D601 `/home/ubuntu/met_nonlinear` 的训练编排后端渲染为下载器式工作台,包括项目库选择、从已有 Project fork 新 Project、加入待启动队列、启动队列、最大并发设置、当前队列、已完成、失败诊断、GPU/镜像、训练进度、ETA、历史记录和显式原始 JSON 按钮;不得提供硬编码的固定数量/固定轮数测试按钮。该模块不得 iframe 业务旧前端、Todo Note 原 Vite 前端或 Pipeline 自身 WebUI,不得把 microservice 后端端口暴露为浏览器直连 URL,也不得把业务 API 的 JSON 裸铺在页面上。
|
||||
|
||||
## Component Data Rendering
|
||||
|
||||
|
||||
@@ -94,12 +94,12 @@ Pipeline 在 UniDesk 语境中按观测后端服务管理:默认页面不得 i
|
||||
- 代码引用:`https://github.com/pikasTech/met_nonlinear` 与配置中的 `repository.commitId`。
|
||||
- 部署引用:业务仓库内 `docker-compose.unidesk.yml`、`docker/unidesk/Dockerfile.server`、`docker/unidesk/Dockerfile.ml`、`composeService=met-nonlinear-ts`、`containerName=met-nonlinear-ts`。
|
||||
- 节点后端:D601 上 `127.0.0.1:3288`,provider-gateway 容器内通过 `http://host.docker.internal:3288` 访问。
|
||||
- 代理路径:只允许 `/health` 和 `/api/` 前缀;允许 `GET`、`HEAD`、`POST`、`PUT`,用于读取队列/历史、创建 server_test 项目、修改 config patch 和训练任务入队。
|
||||
- UniDesk 前端:`微服务 / MET Nonlinear` React 页面负责展示队列、GPU/镜像、Project config 预览、训练进度、ETA、历史训练记录和显式原始 JSON 按钮。
|
||||
- 代理路径:只允许 `/health` 和 `/api/` 前缀;允许 `GET`、`HEAD`、`POST`、`PUT`,用于读取队列/历史、从已有 Project fork 新 Project、保存队列设置、加入待启动队列和启动队列。
|
||||
- UniDesk 前端:`微服务 / MET Nonlinear` React 页面采用类似下载器的工作台交互,负责从项目库选择已有 Project、fork 新 Project、加入待启动队列、启动队列、调整最大并发、分标签展示当前队列/已完成/失败诊断/GPU 与镜像,并展示训练进度、ETA、历史训练记录和显式原始 JSON 按钮。
|
||||
|
||||
MET Nonlinear 的长期服务边界写在业务仓库 `~/met_nonlinear/docs/reference/unidesk_microservice.md`:`met-nonlinear-ts` 是长驻 Bun TypeScript 编排后端,`met-nonlinear-ml:tf26` 是按需训练镜像,每个训练任务用一个 `docker run --rm` 容器执行 `python cli.py -t <projectPath>`,训练完成后容器自动销毁。训练镜像 Dockerfile 必须使用中国大陆可达的软件源;当前固定使用 Huawei Cloud mirror 的 `nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04`、Aliyun apt mirror、Tsinghua PyPI mirror、Ubuntu Python 3.8 和 `tensorflow==2.6.0`,避免官方 TensorFlow 2.6 GPU 镜像 Python 3.6 与业务源码类型注解不兼容。
|
||||
|
||||
MET Nonlinear 验收必须在 D601 上通过 SSH 透传执行:`curl -fsS -X POST http://127.0.0.1:3288/api/queue/server-test -H 'content-type: application/json' -d '{"sourceProject":"projects/FRIKANh6u6l4","count":10,"epochs":10,"maxConcurrency":2}'`,随后轮询 `/api/queue`,确认 `projects/server_test/` 下 10 个 10 轮训练任务全部完成、最多 2 个训练容器并发、目标 GPU 为 2080Ti、2080Ti 显存余量低于 20% 时自动限制并发,并确认训练容器结束后不残留。
|
||||
MET Nonlinear 验收必须通过公网 UniDesk frontend 的交互式 UI 完成:选择已有 source Project,设置训练轮数和最大并发,使用 `Fork Project` 创建新的 `projects/unidesk_forks/` Project,确认新 Project 只是被选中而不会直接训练,再加入待启动队列并点击 `启动队列`。验收时必须确认待启动、排队中、训练中、已完成和失败诊断分标签可见,最大并发按 UI 设置生效,目标 GPU 为 2080Ti,2080Ti 显存余量低于 20% 时自动限制并发,并确认训练容器结束后不残留。CLI `/api/queue/server-test` 仅保留为后端兼容入口,不作为 frontend 操作入口。
|
||||
|
||||
## CLI
|
||||
|
||||
@@ -135,6 +135,6 @@ microservice 交付必须同时通过后端、CLI 和公网 frontend 验证:
|
||||
- 运行 `bun scripts/cli.ts microservice health todo-note` 与 `bun scripts/cli.ts microservice proxy todo-note /api/instances`,确认真实链路经过 backend-core、WebSocket、main-server provider-gateway 和主 server `todo-note-backend` 后端;输出中必须包含五个迁移清单和 PostgreSQL 存储健康状态。
|
||||
- 在 D601 上用 `bun scripts/cli.ts ssh D601 ...` 调试业务仓库和容器,确认 `curl http://127.0.0.1:3254/api/health` 可用;不要把调试服务部署到主 server。
|
||||
- 在 D601 上用 `bun scripts/cli.ts ssh D601 ...` 调试业务仓库和容器,确认 `curl http://127.0.0.1:18082/health` 和 `curl http://127.0.0.1:18082/api/snapshot` 可用;不要把 Pipeline 调试服务部署到主 server。
|
||||
- 在 D601 上用 `bun scripts/cli.ts ssh D601 ...` 调试 `~/met_nonlinear`,确认 `curl http://127.0.0.1:3288/health` 可用,并按 `~/met_nonlinear/docs/reference/unidesk_microservice.md` 完成 10 个 server_test 训练任务;不要把 MET Nonlinear 后端、Docker build 或训练任务部署到主 server。
|
||||
- 在 D601 上用 `bun scripts/cli.ts ssh D601 ...` 调试 `~/met_nonlinear`,确认 `curl http://127.0.0.1:3288/health` 可用;最终验收必须回到公网 UniDesk frontend,通过项目库选择、Fork、加入待启动队列和启动队列完成,不要把 MET Nonlinear 后端、Docker build 或训练任务部署到主 server。
|
||||
- 运行 `bun scripts/cli.ts e2e run`,确认 microservice 相关检查 passed,并确认 Playwright 访问的是公网 `http://74.48.78.17:18081/`。
|
||||
- 登录公网 frontend,进入 `微服务 / 服务目录`、`微服务 / Todo Note`、`微服务 / FindJob`、`微服务 / Pipeline` 和 `微服务 / MET Nonlinear`,确认能看到主 server 与 D601 provider、仓库引用、后端私有映射、Todo Note 迁移清单与树形任务、FindJob 指标和岗位预览、Pipeline 组件矩阵、React Flow 控制图和最近运行、MET Nonlinear 队列/GPU/镜像/Project config/训练历史;Todo Note 页面必须能创建临时清单、添加任务并删除临时清单,删除前必须按唯一临时清单名称重新选中对应行,禁止用未确认的当前 active 清单执行删除,FindJob 页面必须显示真实数字指标、`HEALTH OK` 和非空岗位预览,Pipeline 页面必须显示 `Pipeline v2 工作台`、`Health OK`、组件数和最近运行,MET Nonlinear 页面必须显示 `Health OK`、`创建10个10轮任务`、任务队列和 GPU/镜像面板,不能只停留在 loading 骨架;页面默认不得出现裸 JSON。
|
||||
- 登录公网 frontend,进入 `微服务 / 服务目录`、`微服务 / Todo Note`、`微服务 / FindJob`、`微服务 / Pipeline` 和 `微服务 / MET Nonlinear`,确认能看到主 server 与 D601 provider、仓库引用、后端私有映射、Todo Note 迁移清单与树形任务、FindJob 指标和岗位预览、Pipeline 组件矩阵、React Flow 控制图和最近运行、MET Nonlinear 队列/GPU/镜像/Project config/训练历史;Todo Note 页面必须能创建临时清单、添加任务并删除临时清单,删除前必须按唯一临时清单名称重新选中对应行,禁止用未确认的当前 active 清单执行删除,FindJob 页面必须显示真实数字指标、`HEALTH OK` 和非空岗位预览,Pipeline 页面必须显示 `Pipeline v2 工作台`、`Health OK`、组件数和最近运行,MET Nonlinear 页面必须显示 `Health OK`、`Fork Project`、`启动队列`、`当前队列`、最大并发设置和 GPU/镜像面板,不能只停留在 loading 骨架;页面默认不得出现裸 JSON。
|
||||
|
||||
@@ -10,6 +10,8 @@ Provider Gateway 是计算节点侧容器。它只主动连出到主 server 暴
|
||||
|
||||
计算节点 `provider-gateway` 容器的重建和升级权威路径是 `provider.upgrade` 的 `mode: "schedule"`,或 frontend 中等价的显式升级调度。该路径由在线 provider 通过本地 Docker socket 启动 detached updater 容器,让升级动作脱离当前 WebSocket 与 SSH 透传会话的生命周期;重建目标只能是 `provider-gateway` service,并且必须带 `--no-deps` 与 `--force-recreate`,不得牵连 database、backend-core、frontend 或业务 microservice,也不得因为镜像 tag 未变而 no-op。
|
||||
|
||||
远程升级必须采用 sleep-and-validate 回滚保护:旧 gateway 在成功调度 updater 后关闭当前 WebSocket 并进入最长 5 分钟的助眠期;updater 先构建新镜像,再用旧容器的环境变量、挂载、网络和 `extra_hosts` 拉起候选 gateway;候选 gateway 必须在日志中出现 `connect_open` 和 register ack 成功,才允许把候选容器 restart policy 改为 `always`、删除旧 gateway、并把候选容器改名为原容器名。候选验证失败时 updater 必须删除候选容器并退出失败,旧 gateway 到达助眠上限后自动重连主 server,形成自动回滚。backend-core 必须在同一 Provider ID 被新 WebSocket 替换后忽略旧 WebSocket 的 close 事件,避免候选已上线后又被旧连接关闭标记为 offline。
|
||||
|
||||
禁止通过 UniDesk 自己的 Host SSH / WSL SSH 透传同步执行 `docker compose up -d --build provider-gateway`、`docker compose restart provider-gateway`、`docker rm -f <provider-gateway>` 后再启动等自重建命令。原因是这条 SSH 透传连接正由被重建的旧 `provider-gateway` 容器承载;旧容器停止后会切断控制通道,可能把节点留在旧容器已停、新容器未起的不可达状态。SSH 透传只允许用于诊断、修复升级前置条件、查看本地状态和升级后验证,不允许作为计算节点 `provider-gateway` 正式重建/升级通道。
|
||||
|
||||
只有旧节点尚不支持 `provider.upgrade mode=schedule` 时,才允许使用节点本地终端、节点自有 Web terminal、systemd、计划任务或 detached shell 做一次 bootstrap;bootstrap 完成后必须立刻回到 `provider.upgrade mode=schedule` 做真实升级验证,并通过公网 frontend 或 remote CLI 确认节点重新上线、远程更新可用性和 SSH 透传可用性。
|
||||
@@ -18,7 +20,7 @@ Provider Gateway 是计算节点侧容器。它只主动连出到主 server 暴
|
||||
|
||||
当前主 server 公网 IP 是 `74.48.78.17`,`config.json` 中的 `network.publicHost` 必须保持为该地址;公网 frontend 入口是 `http://74.48.78.17:18081/`,provider gateway 对外接入入口是 `ws://74.48.78.17:18082/ws/provider`,provider ingress 健康检查是 `http://74.48.78.17:18082/health`。主 server 本机 provider 由根目录 `docker-compose.yml` 的 `provider-gateway` 服务启动,容器内使用 Docker 内网地址 `ws://backend-core:8081/ws/provider` 自接入;外部计算节点部署 provider-gateway 时必须改用公网 provider ingress URL,并复用 `config.json` / `.state/docker-compose.env` 中的 provider token、心跳间隔和重连参数。
|
||||
|
||||
计算节点部署 provider-gateway 的最小方法是:准备可运行 `unidesk_provider-gateway` 镜像的 Docker 环境,为节点分配唯一 `PROVIDER_ID` 与可读 `PROVIDER_NAME`,设置 `PROVIDER_SERVER_URL=ws://74.48.78.17:18082/ws/provider`、`PROVIDER_TOKEN`、`PROVIDER_LABELS_JSON`、`HEARTBEAT_INTERVAL_MS`、`RECONNECT_BASE_MS` 和 `RECONNECT_MAX_MS`,并挂载 `/var/run/docker.sock:/var/run/docker.sock` 作为 Docker 状态采集、任务执行和远程升级的唯一自动化通道。所有长期接入节点都必须配置 `PROVIDER_UPGRADE_*` 环境变量,把节点上的 UniDesk 仓库只读挂载到 `PROVIDER_UPGRADE_WORKSPACE_PATH`,并确保升级命令只重建 `provider-gateway` service,不影响 database、backend-core、frontend。provider-gateway 部署必须同时交付 Host SSH / WSL SSH 透传维护桥;WSL 节点应设置 `HOST_SSH_HOST=host.docker.internal`、`HOST_SSH_PORT=22`、`HOST_SSH_USER=<WSL 用户>`、`HOST_SSH_KEY=/run/host-ssh/id_ed25519`、`HOST_REMOTE_CWD=/home/<WSL 用户>`,并把只含维护私钥的宿主目录只读挂载到 `/run/host-ssh`。
|
||||
计算节点部署 provider-gateway 的最小方法是:准备可运行 `unidesk_provider-gateway` 镜像的 Docker 环境,为节点分配唯一 `PROVIDER_ID` 与可读 `PROVIDER_NAME`,设置 `PROVIDER_SERVER_URL=ws://74.48.78.17:18082/ws/provider`、`PROVIDER_TOKEN`、`PROVIDER_LABELS_JSON`、`HEARTBEAT_INTERVAL_MS`、`RECONNECT_BASE_MS` 和 `RECONNECT_MAX_MS`,并挂载 `/var/run/docker.sock:/var/run/docker.sock` 作为 Docker 状态采集、任务执行和远程升级的唯一自动化通道。所有长期接入节点都必须配置 `PROVIDER_UPGRADE_*` 环境变量,把节点上的 UniDesk 仓库只读挂载到 `PROVIDER_UPGRADE_WORKSPACE_PATH`,并确保升级命令只重建 `provider-gateway` service,不影响 database、backend-core、frontend。provider-gateway 容器必须使用 Docker restart policy `always`,Compose 写法是 `restart: always`,`docker run` 写法是 `--restart always`。provider-gateway 部署必须同时交付 Host SSH / WSL SSH 透传维护桥;WSL 节点应设置 `HOST_SSH_HOST=host.docker.internal`、`HOST_SSH_PORT=22`、`HOST_SSH_USER=<WSL 用户>`、`HOST_SSH_KEY=/run/host-ssh/id_ed25519`、`HOST_REMOTE_CWD=/home/<WSL 用户>`,并把只含维护私钥的宿主目录只读挂载到 `/run/host-ssh`。
|
||||
|
||||
## Mandatory SSH Passthrough Bundle
|
||||
|
||||
@@ -36,7 +38,7 @@ WSL 节点应优先使用 WSL 内部原生 Docker Engine 和 `/var/run/docker.so
|
||||
|
||||
WSL provider 的最小环境文件应放在节点本地私有路径,例如 `/home/ubuntu/unidesk/.state/provider-<ID>.env`,并由 `docker run --env-file` 读取。`PROVIDER_LABELS_JSON` 在 Docker env-file 中可以写成单行 JSON;如果临时用 shell `source` 方式调试,必须对整段 JSON 加引号,否则 shell 会按 `{}` 和逗号拆分导致 JSON 解析失败。WSL 节点建议至少包含这些 labels:`host`、`role=wsl-provider`、`wsl=true`、`distro`、`docker=true`;运行时 provider-gateway 会自动追加 `runtime`、`dockerSocketPresent` 和 `gatewayUptimeSeconds`。`.state/provider-<ID>.env`、`logs/provider-<ID>/` 和容器日志属于节点本地运行态,必须保持在 `.gitignore` 覆盖范围内,不能提交 provider token、登录态或运行日志。
|
||||
|
||||
长期运行推荐用 systemd 管理 provider-gateway 容器,而不是只在交互 shell 中运行 Bun 进程。systemd unit 的稳定形态是:`ExecStartPre=-docker rm -f unidesk-provider-gateway-<ID>` 清理同名旧容器,`ExecStart=docker run --name unidesk-provider-gateway-<ID> --env-file ... -v /var/run/docker.sock:/var/run/docker.sock -v /home/ubuntu/unidesk:/workspace:ro -v /home/ubuntu/unidesk/logs/provider-<ID>:/var/log/unidesk -v <ssh-key-dir>:/run/host-ssh:ro unidesk_provider-gateway:<id>`,`ExecStop=docker stop unidesk-provider-gateway-<ID>`,并设置 `Restart=always`。临时部署可以直接使用 `docker run -d --restart unless-stopped`,但仍要保证容器名、env 文件、日志目录、SSH 私钥只读挂载和镜像 tag 都带上节点 ID,便于 frontend、Docker 状态、SSH 透传和本地排障互相对应。`provider.upgrade` 是长期接入节点的必备能力,provider-gateway 不提供 `PROVIDER_UPGRADE_ENABLED` 或等价禁用开关;如果节点缺少升级环境变量或 SSH 透传环境变量,必须修正节点部署,而不是在服务端接受只能预检、不能升级或不能维护透传的半成品状态。
|
||||
长期运行推荐用 systemd 管理 provider-gateway 容器,而不是只在交互 shell 中运行 Bun 进程。systemd unit 的稳定形态是:`ExecStartPre=-docker rm -f unidesk-provider-gateway-<ID>` 清理同名旧容器,`ExecStart=docker run --restart always --name unidesk-provider-gateway-<ID> --env-file ... -v /var/run/docker.sock:/var/run/docker.sock -v /home/ubuntu/unidesk:/workspace:ro -v /home/ubuntu/unidesk/logs/provider-<ID>:/var/log/unidesk -v <ssh-key-dir>:/run/host-ssh:ro unidesk_provider-gateway:<id>`,`ExecStop=docker stop unidesk-provider-gateway-<ID>`,并设置 `Restart=always`。临时部署也必须使用 `docker run -d --restart always`,并保证容器名、env 文件、日志目录、SSH 私钥只读挂载和镜像 tag 都带上节点 ID,便于 frontend、Docker 状态、SSH 透传和本地排障互相对应。`provider.upgrade` 是长期接入节点的必备能力,provider-gateway 不提供 `PROVIDER_UPGRADE_ENABLED` 或等价禁用开关;如果节点缺少升级环境变量或 SSH 透传环境变量,必须修正节点部署,而不是在服务端接受只能预检、不能升级或不能维护透传的半成品状态。
|
||||
|
||||
WSL 本身会在没有前台进程时被 Windows 回收;如果该节点要作为长期在线算力,必须通过 Windows 启动项、计划任务或后台 `wsl.exe -d <distro> -u root -- bash -lc "systemctl start docker unidesk-provider-gateway-<ID>.service; exec sleep infinity"` 这类 keepalive 进程保持发行版运行。仅启用 WSL 内 systemd service 不等价于 Windows 层面的常驻守护。
|
||||
|
||||
@@ -86,6 +88,10 @@ provider ingress 是唯一允许公网暴露的 provider 连接接口,当前
|
||||
|
||||
provider-gateway 必须从自身 `package.json` 读取版本号,并在 register 与 heartbeat labels 中上报 `providerGatewayName`、`providerGatewayVersion`、`providerGatewayStartedAt` 和 `providerGatewayUpgradePolicy`。backend-core 将这些 labels 合并到 `unidesk_nodes.labels`,frontend 在节点清单、资源监控和 `资源节点 / 网关版本` 中展示;旧节点缺少这些字段时只能显示版本未知,不能用猜测值替代。`unideskCapabilities`、`hostSshConfigured`、`hostSshKeyPresent` 和 `hostSshTarget` 也是 WebUI 运维可用性徽标的数据源,用于直接显示每个计算节点的 SSH 透传可用性与远程更新可用性。
|
||||
|
||||
`src/components/provider-gateway` 下任何代码或行为变更都必须在同一变更集中递增 `src/components/provider-gateway/package.json` 的 `version`,不得只改实现而沿用旧版本号。验收时必须通过公网 frontend 的 `资源节点 / 网关版本` 或 `bun scripts/cli.ts debug health` 确认目标 provider 的 `providerGatewayVersion` 已上报新版本;如果线上节点仍显示旧版本,该 provider-gateway 变更不能视为交付完成。
|
||||
|
||||
任何远程升级预检、执行升级和自动更新记录都必须显式显示指定 Provider 的 gateway 版本号。`provider.upgrade` result 的 plan 中必须包含 `providerId`、`providerName`、当前运行中的 `providerGatewayVersion` 和从升级 workspace 的 `src/components/provider-gateway/package.json` 读取到的 `targetProviderGatewayVersion`;frontend 的升级控件与 `资源节点 / 网关版本` 自动更新记录必须把该版本渲染为结构化字段,不能只把版本埋在原始 JSON 中。
|
||||
|
||||
## Docker Status Telemetry
|
||||
|
||||
provider-gateway 连接成功后必须周期性上报 Docker daemon 状态,数据来源是本地 Docker socket 上的 `docker info`、`docker ps -a`、`docker images`、`docker volume ls` 和 `docker network ls`。backend-core 将最新快照保存到 `unidesk_node_docker_status`,frontend 的资源节点 `Docker 状态` 子标签用该快照渲染 Docker Desktop 风格视图;该能力仍然只通过 provider 主动上报,不要求主 server 反向连接计算节点。
|
||||
@@ -100,6 +106,8 @@ backend-core 可以通过真实 WebSocket 调度向在线 provider 下发 `provi
|
||||
|
||||
远程升级策略固定为 always-enabled:只要 provider-gateway 在线并声明 `provider.upgrade`,`mode: "schedule"` 就必须真正调度升级容器,不允许被 `PROVIDER_UPGRADE_ENABLED=false`、前端隐藏按钮或服务端特殊名单禁用。升级能力的安全边界不是开关,而是显式 `PROVIDER_UPGRADE_*` 配置、Docker socket 权限、只读仓库挂载、固定 Compose service 和 `--no-deps` 约束。升级计划中必须展示 `policy: "always-enabled"`、updater 容器名、runner image、workspace、Compose project/service、env file、compose file 和实际 `docker run` 命令,方便前端任务历史与 CLI debug 直接诊断。
|
||||
|
||||
`mode: "schedule"` 的成功返回只代表 updater 已被调度,最终升级成败由候选 gateway 自验证决定。updater 必须先按 Compose 构建新镜像,再用旧容器的 `Config.Env` 生成候选 env-file,并复用旧容器的 Docker socket、日志目录、SSH 私钥只读挂载、Compose 网络和 `extra_hosts`;候选容器启动时 restart policy 必须先是 `no`,验证通过后才能改成 `always` 并删除旧容器。升级计划的 `replacementStrategy` 必须包含 `oldGatewaySleepMs`、`validationTimeoutMs`、`promoteOnlyAfterCandidateValidation`、`candidateRestartPolicyAfterPromotion: "always"`、`candidateUsesOldContainerEnvironment`、`candidateUsesOldContainerMounts`、`candidateUsesOldContainerNetworks` 和 `candidateUsesOldContainerExtraHosts`,并且必须在 plan 中显示指定 Provider 的当前/目标 gateway 版本号,便于前端和 CLI 判断这不是旧的先删旧容器再 up 的危险流程。
|
||||
|
||||
自动更新记录的权威来源是 backend-core 保存的 `provider.upgrade` 任务历史,而不是 provider-gateway 容器日志文件。frontend 必须按 Provider 聚合这些任务,并把状态、模式、task id、来源、耗时、策略、updater 容器摘要、失败原因和更新时间渲染为表格或卡片;完整 task/result JSON 只能由操作员点击 `查看原始JSON` 后查看。
|
||||
|
||||
旧版 provider-gateway 如果只能返回 plan 或因为旧环境中的 `PROVIDER_UPGRADE_ENABLED=false` 拒绝 schedule,需要先通过任意现有维护通道手动 bootstrap 一次。bootstrap 的目标不是长期流程,而是把节点更新到支持 always-enabled 远程升级和 Host SSH / WSL SSH 维护桥的版本;完成后必须立刻用 `bun scripts/cli.ts debug dispatch <PROVIDER_ID> provider.upgrade --mode schedule --wait-ms 15000` 做一次真实一键升级验证,再用 `bun scripts/cli.ts debug health` 或公网 frontend 确认该节点仍在线、`unideskCapabilities` 包含 `provider.upgrade`,需要 SSH 维护的 WSL 节点还必须包含 `host.ssh`。
|
||||
|
||||
@@ -156,6 +156,8 @@ export function rebuildService(config: UniDeskConfig, service: RebuildableServic
|
||||
`label=com.docker.compose.project=${config.docker.projectName}`,
|
||||
"--filter",
|
||||
`label=com.docker.compose.service=${service}`,
|
||||
"--filter",
|
||||
"label=com.docker.compose.oneoff=False",
|
||||
];
|
||||
const upCommand = [...compose, "up", "-d", "--no-deps", service];
|
||||
const script = [
|
||||
@@ -209,6 +211,8 @@ export function dockerContainers(config: UniDeskConfig): ContainerStatus[] {
|
||||
"-a",
|
||||
"--filter",
|
||||
`label=com.docker.compose.project=${config.docker.projectName}`,
|
||||
"--filter",
|
||||
"label=com.docker.compose.oneoff=False",
|
||||
"--format",
|
||||
"{{json .}}",
|
||||
], repoRoot);
|
||||
|
||||
+14
-7
@@ -154,6 +154,8 @@ function dockerPortSummary(): unknown {
|
||||
"ps",
|
||||
"--filter",
|
||||
"label=com.docker.compose.project=unidesk",
|
||||
"--filter",
|
||||
"label=com.docker.compose.oneoff=False",
|
||||
"--format",
|
||||
"{{.Names}}\t{{.Ports}}",
|
||||
], repoRoot);
|
||||
@@ -346,8 +348,9 @@ async function serviceChecks(config: UniDeskConfig, urls: PublicUrls, checks: E2
|
||||
});
|
||||
const upgradeTaskId = (upgradeDispatch as { body?: { taskId?: string } }).body?.taskId ?? "";
|
||||
const upgradeTask = upgradeTaskId ? await waitForTaskStatus(upgradeTaskId, "succeeded") : { ok: false, error: "missing taskId", upgradeDispatch };
|
||||
const taskResult = (upgradeTask as { task?: { result?: { plan?: unknown; mode?: string } }; ok?: boolean }).task?.result;
|
||||
addCheck(checks, "provider:upgrade-plan", (upgradeDispatch as { ok?: boolean }).ok === true && (upgradeTask as { ok?: boolean }).ok === true && taskResult?.mode === "plan" && taskResult.plan !== undefined, { upgradeDispatch, upgradeTask });
|
||||
const taskResult = (upgradeTask as { task?: { result?: { plan?: { providerGatewayVersion?: string; targetProviderGatewayVersion?: string }; mode?: string } }; ok?: boolean }).task?.result;
|
||||
const upgradePlan = taskResult?.plan;
|
||||
addCheck(checks, "provider:upgrade-plan", (upgradeDispatch as { ok?: boolean }).ok === true && (upgradeTask as { ok?: boolean }).ok === true && taskResult?.mode === "plan" && upgradePlan !== undefined && upgradePlan.providerGatewayVersion === expectedGatewayVersion && upgradePlan.targetProviderGatewayVersion === expectedGatewayVersion, { expectedGatewayVersion, upgradeDispatch, upgradeTask });
|
||||
addCheck(checks, "provider-ingress:public-health", (providerIngress as { ok?: boolean; body?: { ok?: boolean } }).ok === true && (providerIngress as { body?: { ok?: boolean } }).body?.ok === true, providerIngress);
|
||||
}
|
||||
|
||||
@@ -589,8 +592,12 @@ async function frontendCheck(config: UniDeskConfig, urls: PublicUrls, checks: E2
|
||||
const lower = text.toLowerCase();
|
||||
return lower.includes("met nonlinear 训练编排")
|
||||
&& text.includes("D601")
|
||||
&& text.includes("任务队列")
|
||||
&& text.includes("GPU 与镜像")
|
||||
&& text.includes("Fork Project")
|
||||
&& text.includes("加入待启动队列")
|
||||
&& text.includes("启动队列")
|
||||
&& text.includes("当前队列")
|
||||
&& text.includes("GPU/镜像")
|
||||
&& !text.includes("创建10个10轮任务")
|
||||
&& text.includes("仅 UniDesk frontend 代理访问")
|
||||
&& /Health\s+OK/i.test(text);
|
||||
}, undefined, { timeout: 30000 });
|
||||
@@ -610,9 +617,9 @@ async function frontendCheck(config: UniDeskConfig, urls: PublicUrls, checks: E2
|
||||
addCheck(checks, "frontend:no-naked-json-before-click", rawBlocksBefore === 0 && !nakedJsonText, { rawBlocksBefore, nakedJsonText });
|
||||
addCheck(checks, "frontend:raw-json-explicit-button", rawText.includes('"providerId"') && rawText.includes(config.providerGateway.id), { rawTextPreview: rawText.slice(0, 400) });
|
||||
addCheck(checks, "frontend:system-monitor-visible", monitorText.includes("任务管理器视图") && monitorText.includes("CPU") && monitorText.includes("Memory") && monitorText.includes("Disk") && monitorText.includes("不含缓存"), { monitorTextPreview: monitorText.slice(0, 800) });
|
||||
addCheck(checks, "frontend:upgrade-plan-dispatch", upgradeControlText.includes("预检升级 已下发"), { providerId: config.providerGateway.id, upgradeControlPreview: upgradeControlText.slice(0, 500) });
|
||||
addCheck(checks, "frontend:upgrade-plan-dispatch", upgradeControlText.includes("预检升级 已下发") && upgradeControlText.includes("指定 Provider") && upgradeControlText.includes(`v${providerGatewayPackageVersion()}`), { providerId: config.providerGateway.id, upgradeControlPreview: upgradeControlText.slice(0, 500) });
|
||||
addCheck(checks, "frontend:docker-status-visible", dockerText.toLowerCase().includes("docker desktop 视图") && dockerText.toLowerCase().includes("containers") && dockerText.includes("unidesk_pgdata_10gb") && (dockerText.includes("unidesk-frontend") || dockerText.includes("unidesk-backend-core")), { dockerTextPreview: dockerText.slice(0, 800) });
|
||||
addCheck(checks, "frontend:gateway-version-records-visible", gatewayTextLower.includes("provider gateway 版本") && gatewayText.includes("自动更新记录") && gatewayText.includes(config.providerGateway.id) && gatewayText.includes(`v${providerGatewayPackageVersion()}`) && gatewayText.includes("provider.upgrade"), { gatewayTextPreview: gatewayText.slice(0, 900) });
|
||||
addCheck(checks, "frontend:gateway-version-records-visible", gatewayTextLower.includes("provider gateway 版本") && gatewayText.includes("自动更新记录") && gatewayText.includes("Gateway 版本") && gatewayText.includes(config.providerGateway.id) && gatewayText.includes(`v${providerGatewayPackageVersion()}`) && gatewayText.includes("provider.upgrade"), { gatewayTextPreview: gatewayText.slice(0, 900) });
|
||||
addCheck(checks, "frontend:provider-operation-availability-visible", sshAvailabilityTexts.length >= 1 && upgradeAvailabilityTexts.length >= 1 && sshAvailabilityTexts.every((text) => text.includes("SSH 透传")) && upgradeAvailabilityTexts.every((text) => text.includes("远程更新")) && upgradeAvailabilityTexts.some((text) => text.includes("always-enabled")), { sshAvailabilityTexts, upgradeAvailabilityTexts });
|
||||
addCheck(checks, "frontend:overview-pgdata-visible", bodyText.includes("PGDATA") && bodyText.includes(config.database.volume), { bodyPreview: bodyText.slice(0, 800) });
|
||||
addCheck(checks, "frontend:microservice-catalog-visible", microserviceCatalogTextLower.includes("findjob") && microserviceCatalogTextLower.includes("pipeline") && microserviceCatalogTextLower.includes("todo note") && microserviceCatalogTextLower.includes("met nonlinear") && microserviceCatalogText.includes("D601") && microserviceCatalogText.includes(config.providerGateway.id) && microserviceCatalogTextLower.includes("private") && microserviceCatalogText.includes("https://gitee.com/Lyon1998/findjob") && microserviceCatalogText.includes("https://github.com/pikasTech/pipeline") && microserviceCatalogText.includes("https://github.com/pikasTech/met_nonlinear") && microserviceCatalogText.includes("https://gitee.com/Lyon1998/todo_note"), { microserviceCatalogPreview: microserviceCatalogText.slice(0, 1600) });
|
||||
@@ -620,7 +627,7 @@ async function frontendCheck(config: UniDeskConfig, urls: PublicUrls, checks: E2
|
||||
addCheck(checks, "frontend:findjob-integrated-visible", findjobTextLower.includes("findjob 工作台".toLowerCase()) && findjobText.includes("岗位总量") && findjobText.includes("D601") && findjobText.includes("近期岗位") && findjobText.includes("仅 UniDesk frontend 代理访问") && /岗位总量\s+\d+/.test(findjobText) && /health\s+ok/i.test(findjobText) && /[1-9]\d*\/[1-9]\d*\s+preview/i.test(findjobText), { findjobTextPreview: findjobText.slice(0, 1200) });
|
||||
addCheck(checks, "frontend:pipeline-integrated-visible", pipelineTextLower.includes("pipeline v2 工作台".toLowerCase()) && pipelineText.includes("D601") && pipelineText.includes("控制图") && pipelineText.includes("最近运行") && pipelineText.includes("仅 UniDesk frontend 代理访问") && /Health\s+OK/i.test(pipelineText) && /组件\s+\d+/.test(pipelineText) && /运行记录\s+[1-9]\d*/.test(pipelineText), { pipelineTextPreview: pipelineText.slice(0, 1200) });
|
||||
addCheck(checks, "frontend:pipeline-react-flow-visible", pipelineFlowNodeCount > 0 && pipelineFlowEdgeCount > 0, { pipelineFlowNodeCount, pipelineFlowEdgeCount });
|
||||
addCheck(checks, "frontend:met-nonlinear-integrated-visible", metNonlinearTextLower.includes("met nonlinear 训练编排") && metNonlinearText.includes("D601") && metNonlinearText.includes("任务队列") && metNonlinearText.includes("GPU 与镜像") && metNonlinearText.includes("创建10个10轮任务") && metNonlinearText.includes("仅 UniDesk frontend 代理访问") && /Health\s+OK/i.test(metNonlinearText), { metNonlinearTextPreview: metNonlinearText.slice(0, 1400) });
|
||||
addCheck(checks, "frontend:met-nonlinear-integrated-visible", metNonlinearTextLower.includes("met nonlinear 训练编排") && metNonlinearText.includes("D601") && metNonlinearText.includes("当前队列") && metNonlinearText.includes("GPU/镜像") && metNonlinearText.includes("Fork Project") && metNonlinearText.includes("加入待启动队列") && metNonlinearText.includes("启动队列") && !metNonlinearText.includes("创建10个10轮任务") && metNonlinearText.includes("仅 UniDesk frontend 代理访问") && /Health\s+OK/i.test(metNonlinearText), { metNonlinearTextPreview: metNonlinearText.slice(0, 1400) });
|
||||
addCheck(checks, "frontend:no-console-errors", consoleErrors.length === 0, { consoleErrors });
|
||||
return { screenshotPath, bodyText, consoleErrors };
|
||||
} finally {
|
||||
|
||||
@@ -1302,6 +1302,10 @@ const providerServer = Bun.serve<WsData>({
|
||||
const providerId = ws.data.providerId;
|
||||
logger("warn", "provider_socket_close", { providerId: providerId ?? null });
|
||||
if (providerId !== undefined) {
|
||||
if (activeProviders.get(providerId) !== ws) {
|
||||
logger("info", "provider_socket_close_ignored_replaced", { providerId });
|
||||
return;
|
||||
}
|
||||
markProviderOffline(providerId).catch((error) => logger("error", "provider_offline_mark_failed", { providerId, error: errorToJson(error) }));
|
||||
}
|
||||
},
|
||||
|
||||
@@ -398,9 +398,9 @@ h2 { font-size: 14px; text-transform: uppercase; letter-spacing: 0.08em; }
|
||||
font-size: 11px;
|
||||
}
|
||||
.status-badge.online, .status-badge.succeeded, .status-badge.public { color: var(--ok); border-color: rgba(113, 191, 120, 0.45); }
|
||||
.status-badge.offline, .status-badge.failed { color: var(--danger); border-color: rgba(207, 106, 84, 0.45); }
|
||||
.status-badge.offline, .status-badge.failed, .status-badge.canceled { color: var(--danger); border-color: rgba(207, 106, 84, 0.45); }
|
||||
.status-badge.running, .status-badge.dispatched, .status-badge.accepted, .status-badge.internal { color: var(--accent-2); border-color: rgba(78, 183, 168, 0.45); }
|
||||
.status-badge.queued, .status-badge.warn { color: var(--warn); border-color: rgba(215, 161, 58, 0.45); }
|
||||
.status-badge.queued, .status-badge.staged, .status-badge.warn { color: var(--warn); border-color: rgba(215, 161, 58, 0.45); }
|
||||
.status-badge.private, .status-badge.p1, .status-badge.prioritized, .status-badge.verified { color: var(--accent-2); border-color: rgba(78, 183, 168, 0.45); }
|
||||
.status-badge.stale, .status-badge.invalid, .status-badge.abandoned { color: var(--warn); border-color: rgba(215, 161, 58, 0.45); }
|
||||
|
||||
@@ -725,6 +725,17 @@ h2 { font-size: 14px; text-transform: uppercase; letter-spacing: 0.08em; }
|
||||
color: var(--muted);
|
||||
}
|
||||
|
||||
.upgrade-target-line {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 6px;
|
||||
align-items: center;
|
||||
padding: 6px 7px;
|
||||
border: 1px solid var(--line-soft);
|
||||
background: rgba(78, 183, 168, 0.07);
|
||||
color: var(--muted);
|
||||
}
|
||||
|
||||
.upgrade-actions {
|
||||
display: flex;
|
||||
gap: 6px;
|
||||
@@ -1037,6 +1048,101 @@ input:focus, select:focus, textarea:focus { border-color: var(--accent-2); }
|
||||
background: linear-gradient(90deg, var(--accent-2), var(--accent));
|
||||
}
|
||||
.met-job-table { max-height: 460px; overflow: auto; }
|
||||
.met-queue-summary {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 6px;
|
||||
align-items: center;
|
||||
margin: 0 0 8px;
|
||||
color: var(--muted);
|
||||
font-size: 12px;
|
||||
}
|
||||
.met-queue-summary > span:not(.status-badge) {
|
||||
padding: 4px 8px;
|
||||
border: 1px solid var(--line-soft);
|
||||
background: var(--panel-3);
|
||||
}
|
||||
.met-action-log {
|
||||
margin-top: 8px;
|
||||
padding: 7px 9px;
|
||||
border: 1px solid var(--line-soft);
|
||||
background: rgba(78, 183, 168, 0.08);
|
||||
color: var(--accent-2);
|
||||
font-size: 12px;
|
||||
}
|
||||
.met-control-strip, .met-tabs {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 6px;
|
||||
align-items: center;
|
||||
}
|
||||
.met-control-strip label, .met-fork-card label {
|
||||
display: grid;
|
||||
gap: 4px;
|
||||
color: var(--muted);
|
||||
font-size: 11px;
|
||||
letter-spacing: 0.04em;
|
||||
}
|
||||
.met-control-strip input, .met-fork-card input, .met-fork-card select {
|
||||
min-height: 28px;
|
||||
padding: 4px 8px;
|
||||
}
|
||||
.met-control-strip input { width: 130px; }
|
||||
.met-tabs {
|
||||
padding: 8px;
|
||||
border-bottom: 1px solid var(--line-soft);
|
||||
}
|
||||
.met-tabs button {
|
||||
min-height: 28px;
|
||||
padding: 4px 10px;
|
||||
border: 1px solid var(--line-soft);
|
||||
background: var(--panel-3);
|
||||
color: var(--text);
|
||||
}
|
||||
.met-tabs button.active {
|
||||
border-color: var(--accent);
|
||||
background: rgba(215, 161, 58, 0.14);
|
||||
}
|
||||
.met-workspace { grid-column: 1 / -1; }
|
||||
.met-form-grid {
|
||||
display: grid;
|
||||
grid-template-columns: minmax(280px, 340px) minmax(560px, 1fr);
|
||||
gap: 10px;
|
||||
align-items: start;
|
||||
}
|
||||
.met-fork-card {
|
||||
display: grid;
|
||||
gap: 8px;
|
||||
padding: 10px;
|
||||
border: 1px solid var(--line-soft);
|
||||
background: var(--panel-3);
|
||||
}
|
||||
.met-fork-card h3 {
|
||||
margin: 0 0 2px;
|
||||
font-size: 13px;
|
||||
}
|
||||
.met-project-list {
|
||||
min-width: 0;
|
||||
border: 1px solid var(--line-soft);
|
||||
}
|
||||
.met-project-table { max-height: 520px; overflow: auto; }
|
||||
.ghost-btn.mini {
|
||||
min-height: 22px;
|
||||
padding: 2px 6px;
|
||||
font-size: 11px;
|
||||
}
|
||||
.pipeline-toolbar {
|
||||
display: flex;
|
||||
flex-wrap: wrap;
|
||||
gap: 6px;
|
||||
justify-content: flex-end;
|
||||
align-items: center;
|
||||
}
|
||||
.pipeline-toolbar select {
|
||||
width: min(280px, 42vw);
|
||||
min-height: 28px;
|
||||
padding: 4px 8px;
|
||||
}
|
||||
.pipeline-flow-frame {
|
||||
height: min(68vh, 720px);
|
||||
min-height: 520px;
|
||||
@@ -1062,10 +1168,10 @@ input:focus, select:focus, textarea:focus { border-color: var(--accent-2); }
|
||||
background: rgba(8, 13, 18, 0.88);
|
||||
}
|
||||
.pipeline-flow-frame .react-flow__edges {
|
||||
z-index: 5;
|
||||
z-index: 30 !important;
|
||||
}
|
||||
.pipeline-flow-frame .react-flow__nodes {
|
||||
z-index: 4;
|
||||
z-index: 20 !important;
|
||||
}
|
||||
.pipeline-flow-frame .react-flow__edge-path {
|
||||
stroke: rgba(129, 147, 159, 0.72);
|
||||
@@ -1076,7 +1182,7 @@ input:focus, select:focus, textarea:focus { border-color: var(--accent-2); }
|
||||
}
|
||||
.pipeline-flow-frame .react-flow__edge {
|
||||
opacity: 0.82;
|
||||
z-index: 12;
|
||||
z-index: 30 !important;
|
||||
}
|
||||
.pipeline-flow-frame .react-flow__edge:hover,
|
||||
.pipeline-flow-frame .react-flow__edge.selected {
|
||||
@@ -1089,6 +1195,12 @@ input:focus, select:focus, textarea:focus { border-color: var(--accent-2); }
|
||||
.pipeline-flow-frame .react-flow__edge.succeeded .react-flow__edge-path { stroke: var(--accent-2); }
|
||||
.pipeline-flow-frame .react-flow__edge.running .react-flow__edge-path { stroke: var(--accent); }
|
||||
.pipeline-flow-frame .react-flow__edge.failed .react-flow__edge-path { stroke: var(--danger); }
|
||||
.pipeline-flow-frame .react-flow__edge.feedback .react-flow__edge-path {
|
||||
stroke-dasharray: 9 7;
|
||||
}
|
||||
.pipeline-flow-frame .react-flow__edge.overlap-colored .react-flow__edge-path {
|
||||
filter: drop-shadow(0 0 6px rgba(215, 161, 58, 0.16));
|
||||
}
|
||||
.pipeline-flow-frame .pipeline-flow-handle {
|
||||
width: 9px;
|
||||
height: 18px;
|
||||
|
||||
@@ -251,6 +251,16 @@ function taskUpgradePolicy(task: any): string {
|
||||
return typeof policy === "string" && policy.length > 0 ? policy : "--";
|
||||
}
|
||||
|
||||
function taskUpgradeVersion(task: any): string {
|
||||
const result = taskResult(task);
|
||||
const plan = result.plan && typeof result.plan === "object" && !Array.isArray(result.plan) ? result.plan as AnyRecord : {};
|
||||
const value = result.targetProviderGatewayVersion
|
||||
?? result.providerGatewayVersion
|
||||
?? plan.targetProviderGatewayVersion
|
||||
?? plan.providerGatewayVersion;
|
||||
return typeof value === "string" && value.length > 0 ? fmtGatewayVersion(value) : "版本未知";
|
||||
}
|
||||
|
||||
function taskUpgradeOutcome(task: any): string {
|
||||
const status = String(task?.status || "").toLowerCase();
|
||||
if (status === "failed") return taskFailureReason(task);
|
||||
@@ -768,6 +778,11 @@ function UpgradeControl({ provider, refresh, onRaw }: AnyRecord) {
|
||||
return h(Panel, { title: "Provider Gateway 升级", eyebrow: "Remote Control" },
|
||||
h("div", { className: "upgrade-control", "data-testid": "provider-upgrade-control" },
|
||||
h("p", null, "通过 UniDesk WebSocket 向当前计算节点下发 provider.upgrade;预检只生成升级计划,执行升级会调度节点本地 updater 容器。"),
|
||||
h("div", { className: "upgrade-target-line" },
|
||||
h("span", null, "指定 Provider"),
|
||||
h("code", null, provider.providerId),
|
||||
h(GatewayVersionBadge, { node: provider }),
|
||||
),
|
||||
h("div", { className: "upgrade-actions" },
|
||||
h("button", { type: "button", className: "ghost-btn", disabled: Boolean(busyMode), onClick: () => run("plan"), "data-testid": "upgrade-plan-button" }, busyMode === "plan" ? "预检中" : "预检升级"),
|
||||
h("button", { type: "button", className: "ghost-btn danger", disabled: Boolean(busyMode), onClick: () => run("schedule"), "data-testid": "upgrade-schedule-button" }, busyMode === "schedule" ? "调度中" : "执行升级"),
|
||||
@@ -776,6 +791,7 @@ function UpgradeControl({ provider, refresh, onRaw }: AnyRecord) {
|
||||
result ? h("div", { className: "upgrade-result" },
|
||||
h(StatusBadge, { status: result.status || "queued" }, result.status || "queued"),
|
||||
h("span", null, `${result.mode === "schedule" ? "执行升级" : "预检升级"} 已下发`),
|
||||
h("span", null, `指定版本 ${fmtGatewayVersion(nodeGatewayVersion(provider))}`),
|
||||
h("code", null, result.taskId || "--"),
|
||||
h(RawButton, { title: "Provider Upgrade Dispatch", data: result, onOpen: onRaw }),
|
||||
) : h("span", { className: "muted" }, "升级任务结果会进入任务历史;执行升级可能导致 provider 短暂重连。"),
|
||||
@@ -795,6 +811,7 @@ function UpgradeRecordsTable({ records, onRaw, compact = false }: AnyRecord) {
|
||||
h("th", null, "来源"),
|
||||
h("th", null, "耗时"),
|
||||
h("th", null, "策略"),
|
||||
h("th", null, "Gateway 版本"),
|
||||
h("th", null, "结果记录"),
|
||||
h("th", null, "更新时间"),
|
||||
h("th", null, "操作"),
|
||||
@@ -806,6 +823,7 @@ function UpgradeRecordsTable({ records, onRaw, compact = false }: AnyRecord) {
|
||||
h("td", null, taskUpgradeSource(task)),
|
||||
h("td", null, h(TaskDurationCell, { task })),
|
||||
h("td", null, taskUpgradePolicy(task)),
|
||||
h("td", null, h("span", { className: "version-chip" }, taskUpgradeVersion(task))),
|
||||
h("td", null, h("span", { className: `upgrade-outcome ${String(task.status || "").toLowerCase()}` }, taskUpgradeOutcome(task))),
|
||||
h("td", null, fmtDate(task.updatedAt)),
|
||||
h("td", null, h(RawButton, { title: `Provider Upgrade Task ${task.id}`, data: task, onOpen: onRaw })),
|
||||
@@ -863,6 +881,7 @@ function GatewayVersionPage({ nodes, tasks, onRaw }: AnyRecord) {
|
||||
h("td", null, row.latest ? h("div", { className: "latest-upgrade-cell" },
|
||||
h(StatusBadge, { status: row.latest.status }),
|
||||
h("span", null, `${taskUpgradeMode(row.latest) === "schedule" ? "执行升级" : "预检"} / ${fmtDate(row.latest.updatedAt)}`),
|
||||
h("small", null, `Gateway ${taskUpgradeVersion(row.latest)}`),
|
||||
h("small", null, taskUpgradeOutcome(row.latest)),
|
||||
) : h("span", { className: "muted" }, "暂无记录")),
|
||||
h("td", null, h(RawButton, { title: `Provider ${row.node.providerId}`, data: row.node, onOpen: onRaw })),
|
||||
|
||||
@@ -26,19 +26,6 @@ function fmtPercent(value: any): string {
|
||||
return Number.isFinite(number) ? `${Math.max(0, Math.min(100, number)).toFixed(1)}%` : "--";
|
||||
}
|
||||
|
||||
function fmtBytes(value: any): string {
|
||||
const bytes = Number(value);
|
||||
if (!Number.isFinite(bytes) || bytes <= 0) return "--";
|
||||
const units = ["B", "KB", "MB", "GB", "TB"];
|
||||
let current = bytes;
|
||||
let index = 0;
|
||||
while (current >= 1024 && index < units.length - 1) {
|
||||
current /= 1024;
|
||||
index += 1;
|
||||
}
|
||||
return `${current.toFixed(index === 0 ? 0 : 1)} ${units[index]}`;
|
||||
}
|
||||
|
||||
function fmtDuration(seconds: any): string {
|
||||
const value = Number(seconds);
|
||||
if (!Number.isFinite(value)) return "--";
|
||||
@@ -121,16 +108,16 @@ function queueCounts(queue: any): AnyRecord {
|
||||
}
|
||||
|
||||
function jobRows(queue: any): any[] {
|
||||
return Array.isArray(queue?.jobs) ? queue.jobs.slice(0, 80) : [];
|
||||
return Array.isArray(queue?.jobs) ? queue.jobs.slice(0, 240) : [];
|
||||
}
|
||||
|
||||
function projectRows(projects: any): any[] {
|
||||
return Array.isArray(projects?.projects) ? projects.projects.slice(0, 40) : [];
|
||||
return Array.isArray(projects?.projects) ? projects.projects.slice(0, 160) : [];
|
||||
}
|
||||
|
||||
function gpuRows(health: any, queue: any): any[] {
|
||||
if (Array.isArray(health?.gpu)) return health.gpu;
|
||||
if (Array.isArray(queue?.gpu)) return queue.gpu;
|
||||
if (Array.isArray(health?.gpu)) return health.gpu;
|
||||
return [];
|
||||
}
|
||||
|
||||
@@ -138,9 +125,38 @@ function metApi(apiBaseUrl: string, path: string): string {
|
||||
return `${apiBaseUrl}/microservices/met-nonlinear/proxy${path}`;
|
||||
}
|
||||
|
||||
function jobDuration(job: any): string {
|
||||
return job.startedAt && job.finishedAt ? fmtDuration((Date.parse(job.finishedAt) - Date.parse(job.startedAt)) / 1000) : "--";
|
||||
}
|
||||
|
||||
function statusLabel(status: string): string {
|
||||
if (status === "staged") return "待启动";
|
||||
if (status === "queued") return "排队中";
|
||||
if (status === "running") return "训练中";
|
||||
if (status === "succeeded") return "已完成";
|
||||
if (status === "failed") return "失败";
|
||||
if (status === "canceled") return "已取消";
|
||||
return status || "unknown";
|
||||
}
|
||||
|
||||
export function MetNonlinearPage({ microservices, onRaw, apiBaseUrl = "/api" }: AnyRecord) {
|
||||
const service = microservices.find((item: any) => item.id === "met-nonlinear") || null;
|
||||
const [state, setState] = useState({ loading: false, actionBusy: false, error: "", health: null, summary: null, queue: null, projects: null, history: null, images: null, refreshedAt: null });
|
||||
const [ui, setUi] = useState(() => ({
|
||||
activeTab: "projects",
|
||||
selectedProjects: {},
|
||||
sourceProject: "",
|
||||
forkCount: 1,
|
||||
forkEpochs: 200,
|
||||
forkPrefix: `ui_fork_${Date.now()}`,
|
||||
maxConcurrency: 3,
|
||||
targetGpuName: "2080 Ti",
|
||||
actionMessage: "",
|
||||
}));
|
||||
|
||||
function patchUi(patch: AnyRecord): void {
|
||||
setUi((prev: any) => ({ ...prev, ...patch }));
|
||||
}
|
||||
|
||||
async function load(): Promise<void> {
|
||||
if (!service) return;
|
||||
@@ -150,7 +166,7 @@ export function MetNonlinearPage({ microservices, onRaw, apiBaseUrl = "/api" }:
|
||||
requestJson(`${apiBaseUrl}/microservices/met-nonlinear/health`),
|
||||
requestJson(metApi(apiBaseUrl, "/api/summary")),
|
||||
requestJson(metApi(apiBaseUrl, "/api/queue")),
|
||||
requestJson(metApi(apiBaseUrl, "/api/projects?root=projects&limit=40")),
|
||||
requestJson(metApi(apiBaseUrl, "/api/projects?root=projects&limit=160")),
|
||||
requestJson(metApi(apiBaseUrl, "/api/history")),
|
||||
requestJson(metApi(apiBaseUrl, "/api/images")),
|
||||
]);
|
||||
@@ -160,19 +176,79 @@ export function MetNonlinearPage({ microservices, onRaw, apiBaseUrl = "/api" }:
|
||||
}
|
||||
}
|
||||
|
||||
async function enqueueServerTest(): Promise<void> {
|
||||
async function runAction(label: string, fn: () => Promise<string | void>): Promise<void> {
|
||||
setState((prev: any) => ({ ...prev, actionBusy: true, error: "" }));
|
||||
patchUi({ actionMessage: `${label}...` });
|
||||
try {
|
||||
await requestJson(metApi(apiBaseUrl, "/api/queue/server-test"), {
|
||||
method: "POST",
|
||||
body: JSON.stringify({ sourceProject: "projects/FRIKANh6u6l4", count: 10, epochs: 10, maxConcurrency: 2 }),
|
||||
});
|
||||
const message = await fn();
|
||||
patchUi({ actionMessage: message || `${label}完成` });
|
||||
await load();
|
||||
} catch (err) {
|
||||
setState((prev: any) => ({ ...prev, actionBusy: false, error: errorMessage(err, "Server test 入队失败") }));
|
||||
setState((prev: any) => ({ ...prev, actionBusy: false, error: errorMessage(err, `${label}失败`) }));
|
||||
}
|
||||
}
|
||||
|
||||
async function saveQueueSettings(): Promise<void> {
|
||||
await runAction("保存并发设置", async () => {
|
||||
await requestJson(metApi(apiBaseUrl, "/api/queue/settings"), {
|
||||
method: "PUT",
|
||||
body: JSON.stringify({ maxConcurrency: Number(ui.maxConcurrency), targetGpuName: ui.targetGpuName }),
|
||||
});
|
||||
});
|
||||
}
|
||||
|
||||
function selectedProjectPaths(): string[] {
|
||||
return Object.entries(ui.selectedProjects).filter(([, selected]) => selected).map(([path]) => path);
|
||||
}
|
||||
|
||||
async function stageSelectedProjects(): Promise<void> {
|
||||
const paths = selectedProjectPaths();
|
||||
if (paths.length === 0) throw new Error("请先选择至少一个 project");
|
||||
await runAction("加入待启动队列", async () => {
|
||||
await requestJson(metApi(apiBaseUrl, "/api/queue"), {
|
||||
method: "POST",
|
||||
body: JSON.stringify({ projectPaths: paths, maxConcurrency: Number(ui.maxConcurrency), targetGpuName: ui.targetGpuName, start: false }),
|
||||
});
|
||||
patchUi({ activeTab: "current", selectedProjects: {} });
|
||||
});
|
||||
}
|
||||
|
||||
async function forkProjects(): Promise<void> {
|
||||
const sourceProject = ui.sourceProject || projects[0]?.projectPath;
|
||||
if (!sourceProject) throw new Error("请先选择源 project");
|
||||
await runAction("Fork Project", async () => {
|
||||
const forked = await requestJson(metApi(apiBaseUrl, "/api/projects/fork"), {
|
||||
method: "POST",
|
||||
body: JSON.stringify({ sourceProject, count: Number(ui.forkCount), epochs: Number(ui.forkEpochs), prefix: ui.forkPrefix }),
|
||||
});
|
||||
const projectPaths = Array.isArray(forked.projectPaths) ? forked.projectPaths : [];
|
||||
const selectedProjects = projectPaths.reduce((acc: AnyRecord, projectPath: string) => {
|
||||
acc[projectPath] = true;
|
||||
return acc;
|
||||
}, { ...ui.selectedProjects });
|
||||
patchUi({
|
||||
selectedProjects,
|
||||
});
|
||||
return `已 fork ${projectPaths.length} 个 project,并已自动勾选;请确认后点击加入待启动队列。`;
|
||||
});
|
||||
}
|
||||
|
||||
async function startQueue(): Promise<void> {
|
||||
await runAction("启动队列", async () => {
|
||||
await requestJson(metApi(apiBaseUrl, "/api/queue/start"), {
|
||||
method: "POST",
|
||||
body: JSON.stringify({ maxConcurrency: Number(ui.maxConcurrency), targetGpuName: ui.targetGpuName }),
|
||||
});
|
||||
patchUi({ activeTab: "current" });
|
||||
});
|
||||
}
|
||||
|
||||
async function cancelJob(job: any): Promise<void> {
|
||||
await runAction("取消任务", async () => {
|
||||
await requestJson(metApi(apiBaseUrl, `/api/jobs/${encodeURIComponent(job.id)}/cancel`), { method: "POST", body: JSON.stringify({}) });
|
||||
});
|
||||
}
|
||||
|
||||
useEffect(() => {
|
||||
load();
|
||||
}, [service?.id, service?.runtime?.providerStatus]);
|
||||
@@ -188,14 +264,62 @@ export function MetNonlinearPage({ microservices, onRaw, apiBaseUrl = "/api" }:
|
||||
const image = state.images?.mlImage || state.health?.image || {};
|
||||
const jobs = jobRows(state.queue);
|
||||
const projects = projectRows(state.projects);
|
||||
const history = Array.isArray(state.history?.jobs) ? state.history.jobs.slice(0, 20) : [];
|
||||
const sourceProject = ui.sourceProject || projects[0]?.projectPath || "";
|
||||
const currentJobs = jobs.filter((job) => ["staged", "queued", "running"].includes(job.status));
|
||||
const completedJobs = jobs.filter((job) => job.status === "succeeded");
|
||||
const failedJobs = jobs.filter((job) => ["failed", "canceled"].includes(job.status));
|
||||
const terminalHistory = Array.isArray(state.history?.jobs) ? state.history.jobs.slice(0, 120) : [];
|
||||
const tabItems = [
|
||||
{ id: "projects", label: "项目库", count: projects.length },
|
||||
{ id: "current", label: "当前队列", count: currentJobs.length },
|
||||
{ id: "completed", label: "已完成", count: completedJobs.length || Number(counts.succeeded || 0) },
|
||||
{ id: "failed", label: "失败诊断", count: failedJobs.length || Number(counts.failed || 0) + Number(counts.canceled || 0) },
|
||||
{ id: "gpu", label: "GPU/镜像", count: gpus.length },
|
||||
];
|
||||
|
||||
function queueTable(rows: any[], mode: string) {
|
||||
if (rows.length === 0) return h(EmptyState, { title: mode === "current" ? "当前队列为空" : "暂无记录", text: mode === "current" ? "从项目库选择或 fork project 后先加入待启动队列,再启动队列。" : "终态任务会显示耗时、exit code 和失败诊断。" });
|
||||
return h("div", { className: "table-wrap met-job-table" }, h("table", null,
|
||||
h("thead", null, h("tr", null, h("th", null, "状态"), h("th", null, "Project"), h("th", null, "Epoch"), h("th", null, "ETA/耗时"), h("th", null, "GPU"), h("th", null, "Exit"), h("th", null, "更新时间"), h("th", null, "操作"))),
|
||||
h("tbody", null, rows.map((job: any) => {
|
||||
const progress = job.progress || {};
|
||||
const canCancel = ["staged", "queued", "running"].includes(job.status);
|
||||
return h("tr", { key: job.id },
|
||||
h("td", null, h(StatusBadge, { status: job.status }, statusLabel(job.status))),
|
||||
h("td", null, h("strong", null, job.projectPath), h("code", null, job.id)),
|
||||
h("td", null,
|
||||
h("span", null, `${progress.currentEpoch ?? "--"} / ${progress.epochTarget ?? job.epochTarget ?? "--"}`),
|
||||
h("div", { className: "met-progress" }, h("span", { style: { width: fmtPercent(progress.progressPercent) } })),
|
||||
),
|
||||
h("td", null, job.status === "succeeded" || job.status === "failed" || job.status === "canceled" ? jobDuration(job) : fmtDuration(progress.etaSeconds)),
|
||||
h("td", null, job.gpuName || "--"),
|
||||
h("td", null, job.exitCode ?? "--"),
|
||||
h("td", null, fmtDate(job.updatedAt)),
|
||||
h("td", null,
|
||||
canCancel ? h("button", { type: "button", className: "ghost-btn mini", onClick: () => cancelJob(job), disabled: state.actionBusy }, "取消") : null,
|
||||
h(RawButton, { title: `MET Job ${job.id}`, data: job, onOpen: onRaw, testId: `raw-met-job-${job.id}` }),
|
||||
),
|
||||
);
|
||||
})),
|
||||
));
|
||||
}
|
||||
|
||||
function currentQueueSummary() {
|
||||
return h("div", { className: "met-queue-summary", "data-testid": "met-current-summary" },
|
||||
h(StatusBadge, { status: "staged" }, `待启动 ${counts.staged ?? 0}`),
|
||||
h(StatusBadge, { status: "queued" }, `排队中 ${counts.queued ?? 0}`),
|
||||
h(StatusBadge, { status: "running" }, `训练中 ${counts.running ?? 0}`),
|
||||
h("span", null, `最大并发 ${state.summary?.queue?.maxConcurrency ?? state.queue?.queue?.maxConcurrency ?? ui.maxConcurrency}`),
|
||||
h("span", null, `目标 GPU ${state.summary?.queue?.targetGpuName ?? state.queue?.queue?.targetGpuName ?? ui.targetGpuName}`),
|
||||
);
|
||||
}
|
||||
|
||||
return h("div", { className: "met-page", "data-testid": "met-nonlinear-page" },
|
||||
h(Panel, {
|
||||
title: "MET Nonlinear 训练编排",
|
||||
eyebrow: "D601 GPU Microservice",
|
||||
actions: h("div", { className: "panel-actions" },
|
||||
h("button", { type: "button", className: "ghost-btn", onClick: load, disabled: state.loading, "data-testid": "met-refresh-button" }, state.loading ? "刷新中" : "刷新"),
|
||||
h("button", { type: "button", className: "primary-btn", onClick: enqueueServerTest, disabled: state.actionBusy, "data-testid": "met-server-test-button" }, state.actionBusy ? "入队中" : "创建10个10轮任务"),
|
||||
h(RawButton, { title: "MET Nonlinear Microservice", data: service, onOpen: onRaw, testId: "raw-met-service" }),
|
||||
),
|
||||
},
|
||||
@@ -220,86 +344,81 @@ export function MetNonlinearPage({ microservices, onRaw, apiBaseUrl = "/api" }:
|
||||
),
|
||||
),
|
||||
state.error ? h("div", { className: "form-error wide" }, state.error) : null,
|
||||
ui.actionMessage ? h("div", { className: "met-action-log", "data-testid": "met-action-message" }, ui.actionMessage) : null,
|
||||
),
|
||||
h("div", { className: "met-grid" },
|
||||
h(Panel, { title: "核心状态", eyebrow: state.refreshedAt ? `Updated ${fmtClock(state.refreshedAt)}` : "Queue + GPU" },
|
||||
h("div", { className: "metric-grid" },
|
||||
h(MetricCard, { label: "Queued", value: counts.queued ?? 0, hint: "等待训练", tone: Number(counts.queued || 0) > 0 ? "warn" : "" }),
|
||||
h(MetricCard, { label: "Staged", value: counts.staged ?? 0, hint: "加入队列未开始", tone: Number(counts.staged || 0) > 0 ? "warn" : "" }),
|
||||
h(MetricCard, { label: "Queued", value: counts.queued ?? 0, hint: "排队等待调度", tone: Number(counts.queued || 0) > 0 ? "warn" : "" }),
|
||||
h(MetricCard, { label: "Running", value: counts.running ?? 0, hint: `max ${state.summary?.queue?.maxConcurrency ?? state.queue?.queue?.maxConcurrency ?? "--"}`, tone: Number(counts.running || 0) > 0 ? "ok" : "" }),
|
||||
h(MetricCard, { label: "Succeeded", value: counts.succeeded ?? 0, hint: "历史成功" }),
|
||||
h(MetricCard, { label: "Succeeded", value: counts.succeeded ?? 0, hint: "已完成" }),
|
||||
h(MetricCard, { label: "Failed", value: counts.failed ?? 0, hint: "需要诊断", tone: Number(counts.failed || 0) > 0 ? "warn" : "" }),
|
||||
h(MetricCard, { label: "2080Ti Free", value: targetGpu ? fmtPercent(Number(targetGpu.freeRatio) * 100) : "--", hint: targetGpu ? `${targetGpu.memoryFreeMiB}/${targetGpu.memoryTotalMiB} MiB` : "等待 GPU 上报" }),
|
||||
h(MetricCard, { label: "ML Image", value: image.present ? "READY" : "MISSING", hint: image.image || "met-nonlinear-ml:tf26", tone: image.present ? "ok" : "warn" }),
|
||||
h(MetricCard, { label: "Projects", value: projects.length, hint: "config preview" }),
|
||||
h(MetricCard, { label: "Health", value: state.health?.ok ? "OK" : "--", hint: "D601 /health" }),
|
||||
),
|
||||
h("div", { className: "panel-actions inline-actions" },
|
||||
h(RawButton, { title: "MET Summary", data: state.summary, onOpen: onRaw, testId: "raw-met-summary" }),
|
||||
h(RawButton, { title: "MET Images", data: state.images, onOpen: onRaw, testId: "raw-met-images" }),
|
||||
),
|
||||
h(Panel, { title: "队列控制", eyebrow: "Downloader-like staging" },
|
||||
h("div", { className: "met-control-strip" },
|
||||
h("label", null, "最大并发", h("input", { type: "number", min: 1, max: 16, value: ui.maxConcurrency, "data-testid": "met-max-concurrency-input", onChange: (event: any) => patchUi({ maxConcurrency: event.target.value }) })),
|
||||
h("label", null, "目标 GPU", h("input", { value: ui.targetGpuName, "data-testid": "met-target-gpu-input", onChange: (event: any) => patchUi({ targetGpuName: event.target.value }) })),
|
||||
h("button", { type: "button", className: "ghost-btn", onClick: saveQueueSettings, disabled: state.actionBusy, "data-testid": "met-save-settings-button" }, "保存设置"),
|
||||
h("button", { type: "button", className: "primary-btn", onClick: startQueue, disabled: state.actionBusy || Number(counts.staged || 0) === 0, "data-testid": "met-start-queue-button" }, "启动队列"),
|
||||
),
|
||||
h("p", { className: "muted paragraph" }, "Project 先进入待启动队列,不会立即训练;点击启动队列后才切换为排队中,并由 D601 scheduler 按最大并发和 2080Ti 显存策略调度。"),
|
||||
),
|
||||
h(Panel, { title: "GPU 与镜像", eyebrow: `${gpus.length} GPU` },
|
||||
gpus.length === 0 ? h(EmptyState, { title: "暂无 GPU 上报", text: "等待 D601 met-nonlinear-ts 或 ML image 提供 nvidia-smi 数据" }) :
|
||||
h("div", { className: "table-wrap" }, h("table", null,
|
||||
h("thead", null, h("tr", null, h("th", null, "Index"), h("th", null, "Name"), h("th", null, "Free"), h("th", null, "Policy"))),
|
||||
h("tbody", null, gpus.map((gpu: any) => h("tr", { key: gpu.index },
|
||||
h("td", null, gpu.index),
|
||||
h("td", null, gpu.name),
|
||||
h("td", null, `${gpu.memoryFreeMiB} / ${gpu.memoryTotalMiB} MiB`, h("div", { className: "met-progress" }, h("span", { style: { width: fmtPercent(Number(gpu.freeRatio) * 100) } }))),
|
||||
h("td", null, String(gpu.name || "").includes("2080") ? "target 2080Ti, <20% 限制并发" : "non-target"),
|
||||
))),
|
||||
)),
|
||||
),
|
||||
h(Panel, { title: "任务队列", eyebrow: `${jobs.length} Jobs` },
|
||||
jobs.length === 0 ? h(EmptyState, { title: "队列为空", text: "可创建 server_test 训练任务或通过 API 添加已有 projects" }) :
|
||||
h("div", { className: "table-wrap met-job-table" }, h("table", null,
|
||||
h("thead", null, h("tr", null, h("th", null, "状态"), h("th", null, "Project"), h("th", null, "Epoch"), h("th", null, "ETA"), h("th", null, "GPU"), h("th", null, "Exit"), h("th", null, "更新时间"))),
|
||||
h("tbody", null, jobs.map((job: any) => {
|
||||
const progress = job.progress || {};
|
||||
return h("tr", { key: job.id },
|
||||
h("td", null, h(StatusBadge, { status: job.status }, job.status)),
|
||||
h("td", null, h("strong", null, job.projectPath), h("code", null, job.id)),
|
||||
h("td", null,
|
||||
h("span", null, `${progress.currentEpoch ?? "--"} / ${progress.epochTarget ?? job.epochTarget ?? "--"}`),
|
||||
h("div", { className: "met-progress" }, h("span", { style: { width: fmtPercent(progress.progressPercent) } })),
|
||||
),
|
||||
h("td", null, fmtDuration(progress.etaSeconds)),
|
||||
h("td", null, job.gpuName || "--"),
|
||||
h("td", null, job.exitCode ?? "--"),
|
||||
h("td", null, fmtDate(job.updatedAt)),
|
||||
);
|
||||
})),
|
||||
)),
|
||||
h("div", { className: "panel-actions inline-actions" }, h(RawButton, { title: "MET Queue", data: state.queue, onOpen: onRaw, testId: "raw-met-queue" })),
|
||||
),
|
||||
h(Panel, { title: "Project Config 预览", eyebrow: "projects/" },
|
||||
projects.length === 0 ? h(EmptyState, { title: "暂无 project", text: "等待 D601 返回 /api/projects" }) :
|
||||
h("div", { className: "table-wrap" }, h("table", null,
|
||||
h("thead", null, h("tr", null, h("th", null, "Project"), h("th", null, "Model"), h("th", null, "Epochs"), h("th", null, "Step/Epoch"), h("th", null, "GPU"), h("th", null, "Progress"))),
|
||||
h("tbody", null, projects.map((project: any) => h("tr", { key: project.projectPath },
|
||||
h("td", null, project.projectPath),
|
||||
h("td", null, project.useModel || "--"),
|
||||
h("td", null, project.epochTrain ?? "--"),
|
||||
h("td", null, project.stepPerEpoch ?? "--"),
|
||||
h("td", null, project.usingGpu ? "true" : "false"),
|
||||
h("td", null, fmtPercent(project.progress?.progressPercent)),
|
||||
))),
|
||||
)),
|
||||
h("div", { className: "panel-actions inline-actions" }, h(RawButton, { title: "MET Projects", data: state.projects, onOpen: onRaw, testId: "raw-met-projects" })),
|
||||
),
|
||||
h(Panel, { title: "历史训练记录", eyebrow: `${history.length} Terminal Jobs` },
|
||||
history.length === 0 ? h(EmptyState, { title: "暂无历史", text: "完成或失败的训练任务会显示耗时、exit code 和失败诊断" }) :
|
||||
h("div", { className: "history-list" }, history.map((job: any) => h("article", { key: job.id, className: "result-card" },
|
||||
h("div", { className: "node-card-head" }, h("strong", null, job.projectPath), h(StatusBadge, { status: job.status }, job.status)),
|
||||
h("div", { className: "docker-meta compact" },
|
||||
h("span", null, `耗时 ${job.startedAt && job.finishedAt ? fmtDuration((Date.parse(job.finishedAt) - Date.parse(job.startedAt)) / 1000) : "--"}`),
|
||||
h("span", null, `exit ${job.exitCode ?? "--"}`),
|
||||
h("span", null, job.gpuName || "--"),
|
||||
h("section", { className: "panel met-workspace" },
|
||||
h("div", { className: "met-tabs", role: "tablist" }, tabItems.map((tab) => h("button", {
|
||||
key: tab.id,
|
||||
type: "button",
|
||||
className: ui.activeTab === tab.id ? "active" : "",
|
||||
onClick: () => patchUi({ activeTab: tab.id }),
|
||||
"data-testid": `met-tab-${tab.id}`,
|
||||
}, `${tab.label} ${tab.count}`))),
|
||||
h("div", { className: "panel-body" },
|
||||
ui.activeTab === "projects" ? h("div", { className: "met-form-grid", "data-testid": "met-projects-pane" },
|
||||
h("div", { className: "met-fork-card" },
|
||||
h("h3", null, "Fork Project"),
|
||||
h("label", null, "源 Project", h("select", { value: sourceProject, "data-testid": "met-source-project-select", onChange: (event: any) => patchUi({ sourceProject: event.target.value }) }, projects.map((project: any) => h("option", { key: project.projectPath, value: project.projectPath }, `${project.projectPath} · ${project.useModel || "model?"}`)))),
|
||||
h("label", null, "Fork 数量", h("input", { type: "number", min: 1, max: 100, value: ui.forkCount, "data-testid": "met-fork-count-input", onChange: (event: any) => patchUi({ forkCount: event.target.value }) })),
|
||||
h("label", null, "训练轮数", h("input", { type: "number", min: 1, max: 100000, value: ui.forkEpochs, "data-testid": "met-fork-epochs-input", onChange: (event: any) => patchUi({ forkEpochs: event.target.value }) })),
|
||||
h("label", null, "目标前缀", h("input", { value: ui.forkPrefix, "data-testid": "met-fork-prefix-input", onChange: (event: any) => patchUi({ forkPrefix: event.target.value }) })),
|
||||
h("button", { type: "button", className: "primary-btn", onClick: forkProjects, disabled: state.actionBusy || !sourceProject, "data-testid": "met-fork-button" }, "Fork Project"),
|
||||
h("p", { className: "muted paragraph" }, "Fork 只创建新 Project 并自动勾选,不会直接训练;需要在右侧确认后加入待启动队列。"),
|
||||
),
|
||||
job.error ? h("p", { className: "form-error" }, job.error.slice(0, 260)) : h("p", { className: "muted" }, "无失败原因"),
|
||||
h("code", null, job.id),
|
||||
))),
|
||||
h("div", { className: "panel-actions inline-actions" }, h(RawButton, { title: "MET History", data: state.history, onOpen: onRaw, testId: "raw-met-history" })),
|
||||
h("div", { className: "met-project-list" },
|
||||
h("div", { className: "panel-head compact" }, h("div", null, h("p", { className: "panel-eyebrow" }, "Existing Projects"), h("h2", null, "选择已有 Project")), h("button", { type: "button", className: "ghost-btn", onClick: stageSelectedProjects, disabled: state.actionBusy || selectedProjectPaths().length === 0, "data-testid": "met-stage-selected-button" }, `加入待启动队列 (${selectedProjectPaths().length})`)),
|
||||
projects.length === 0 ? h(EmptyState, { title: "暂无 project", text: "等待 D601 返回 /api/projects" }) :
|
||||
h("div", { className: "table-wrap met-project-table" }, h("table", null,
|
||||
h("thead", null, h("tr", null, h("th", null, "选择"), h("th", null, "Project"), h("th", null, "Model"), h("th", null, "Epochs"), h("th", null, "Progress"))),
|
||||
h("tbody", null, projects.map((project: any) => h("tr", { key: project.projectPath },
|
||||
h("td", null, h("input", { type: "checkbox", checked: Boolean(ui.selectedProjects[project.projectPath]), onChange: (event: any) => patchUi({ selectedProjects: { ...ui.selectedProjects, [project.projectPath]: event.target.checked } }), "data-testid": `met-project-checkbox-${project.projectPath.replace(/[^a-zA-Z0-9_-]/g, "-")}` })),
|
||||
h("td", null, h("strong", null, project.projectPath)),
|
||||
h("td", null, project.useModel || "--"),
|
||||
h("td", null, project.epochTrain ?? "--"),
|
||||
h("td", null, fmtPercent(project.progress?.progressPercent)),
|
||||
))),
|
||||
)),
|
||||
),
|
||||
) : null,
|
||||
ui.activeTab === "current" ? h("div", { "data-testid": "met-current-pane" }, currentQueueSummary(), queueTable(currentJobs, "current"), h("div", { className: "panel-actions inline-actions" }, h(RawButton, { title: "MET Queue", data: state.queue, onOpen: onRaw, testId: "raw-met-queue" }))) : null,
|
||||
ui.activeTab === "completed" ? h("div", { "data-testid": "met-completed-pane" }, queueTable(completedJobs.length > 0 ? completedJobs : terminalHistory.filter((job: any) => job.status === "succeeded"), "completed")) : null,
|
||||
ui.activeTab === "failed" ? h("div", { "data-testid": "met-failed-pane" }, queueTable(failedJobs.length > 0 ? failedJobs : terminalHistory.filter((job: any) => ["failed", "canceled"].includes(job.status)), "failed"), h("div", { className: "panel-actions inline-actions" }, h(RawButton, { title: "MET History", data: state.history, onOpen: onRaw, testId: "raw-met-history" }))) : null,
|
||||
ui.activeTab === "gpu" ? h("div", { className: "met-gpu-pane", "data-testid": "met-gpu-pane" },
|
||||
gpus.length === 0 ? h(EmptyState, { title: "暂无 GPU 上报", text: "等待 D601 met-nonlinear-ts 或 ML image 提供 nvidia-smi 数据" }) :
|
||||
h("div", { className: "table-wrap" }, h("table", null,
|
||||
h("thead", null, h("tr", null, h("th", null, "Index"), h("th", null, "Name"), h("th", null, "Free"), h("th", null, "Policy"))),
|
||||
h("tbody", null, gpus.map((gpu: any) => h("tr", { key: gpu.index },
|
||||
h("td", null, gpu.index),
|
||||
h("td", null, gpu.name),
|
||||
h("td", null, `${gpu.memoryFreeMiB} / ${gpu.memoryTotalMiB} MiB`, h("div", { className: "met-progress" }, h("span", { style: { width: fmtPercent(Number(gpu.freeRatio) * 100) } }))),
|
||||
h("td", null, String(gpu.name || "").includes("2080") ? "target 2080Ti, <20% 限制并发" : "non-target"),
|
||||
))),
|
||||
)),
|
||||
h("div", { className: "panel-actions inline-actions" }, h(RawButton, { title: "MET Images", data: state.images, onOpen: onRaw, testId: "raw-met-images" })),
|
||||
) : null,
|
||||
),
|
||||
),
|
||||
),
|
||||
);
|
||||
|
||||
@@ -8,23 +8,143 @@ const { useEffect } = React;
|
||||
const useState: any = React.useState;
|
||||
|
||||
const pipelineInputPorts: AnyRecord[] = [
|
||||
{ id: "in-left-top", side: "left", position: Position.Left, style: { top: "24%" } },
|
||||
{ id: "in-left-mid", side: "left", position: Position.Left, style: { top: "50%" } },
|
||||
{ id: "in-left-bottom", side: "left", position: Position.Left, style: { top: "76%" } },
|
||||
{ id: "in-top-left", side: "top", position: Position.Top, style: { left: "28%" } },
|
||||
{ id: "in-top-mid", side: "top", position: Position.Top, style: { left: "50%" } },
|
||||
{ id: "in-top-right", side: "top", position: Position.Top, style: { left: "72%" } },
|
||||
{ id: "in-bottom-left", side: "bottom", position: Position.Bottom, style: { left: "28%" } },
|
||||
{ id: "in-bottom-mid", side: "bottom", position: Position.Bottom, style: { left: "50%" } },
|
||||
{ id: "in-bottom-right", side: "bottom", position: Position.Bottom, style: { left: "72%" } },
|
||||
{ id: "in-left", side: "left", position: Position.Left, style: { top: "50%" } },
|
||||
{ id: "in-top-left", side: "top", slot: "left", slotIndex: -1, position: Position.Top, style: { left: "28%" } },
|
||||
{ id: "in-top-mid", side: "top", slot: "mid", slotIndex: 0, position: Position.Top, style: { left: "50%" } },
|
||||
{ id: "in-top-right", side: "top", slot: "right", slotIndex: 1, position: Position.Top, style: { left: "72%" } },
|
||||
{ id: "in-bottom-left", side: "bottom", slot: "left", slotIndex: -1, position: Position.Bottom, style: { left: "28%" } },
|
||||
{ id: "in-bottom-mid", side: "bottom", slot: "mid", slotIndex: 0, position: Position.Bottom, style: { left: "50%" } },
|
||||
{ id: "in-bottom-right", side: "bottom", slot: "right", slotIndex: 1, position: Position.Bottom, style: { left: "72%" } },
|
||||
];
|
||||
|
||||
const pipelineOutputPorts: AnyRecord[] = [
|
||||
{ id: "out-right-top", position: Position.Right, style: { top: "24%" } },
|
||||
{ id: "out-right-mid", position: Position.Right, style: { top: "50%" } },
|
||||
{ id: "out-right-bottom", position: Position.Right, style: { top: "76%" } },
|
||||
{ id: "out-right", position: Position.Right, style: { top: "50%" } },
|
||||
];
|
||||
|
||||
const pipelineOverlapEdgePalette = ["#4eb7a8", "#d7a13a", "#69aee8", "#e0835f", "#b7d86b", "#d98bd2", "#5fc6bf"];
|
||||
const pipelineNodeWidth = 236;
|
||||
const pipelineNodeHeight = 88;
|
||||
|
||||
function pipelinePercent(value: any, fallback: number): number {
|
||||
const number = Number.parseFloat(String(value || ""));
|
||||
return Number.isFinite(number) ? number / 100 : fallback;
|
||||
}
|
||||
|
||||
function pipelineEndpointScore(port: AnyRecord, loads: Map<string, number>, laneHint: number): number {
|
||||
const side = String(port.side || "");
|
||||
if (side !== "top" && side !== "bottom") return 0;
|
||||
const slotIndex = Number(port.slotIndex || 0);
|
||||
const midId = side === "top" ? "in-top-mid" : "in-bottom-mid";
|
||||
const load = loads.get(port.id) || 0;
|
||||
const midLoad = loads.get(midId) || 0;
|
||||
if (slotIndex === 0) return midLoad === 0 ? -26 : 28 + load * 74;
|
||||
const lanePreference = laneHint === 0 ? Math.abs(slotIndex) * 2 : (Math.sign(laneHint) === Math.sign(slotIndex) ? -3 : 3);
|
||||
if (midLoad > 0 && load === 0) return -14 + lanePreference;
|
||||
return 8 + load * 74 + lanePreference;
|
||||
}
|
||||
|
||||
function pipelineSmoothPath(points: Array<{ x: number; y: number }>): string {
|
||||
const compact = points.filter((point, index) => {
|
||||
const previous = points[index - 1];
|
||||
return !previous || Math.abs(previous.x - point.x) > 0.5 || Math.abs(previous.y - point.y) > 0.5;
|
||||
});
|
||||
if (compact.length < 2) return "";
|
||||
let path = `M ${compact[0].x},${compact[0].y}`;
|
||||
let last = compact[0];
|
||||
for (let index = 1; index < compact.length - 1; index += 1) {
|
||||
const previous = compact[index - 1];
|
||||
const current = compact[index];
|
||||
const next = compact[index + 1];
|
||||
const previousDistance = Math.hypot(current.x - previous.x, current.y - previous.y);
|
||||
const nextDistance = Math.hypot(next.x - current.x, next.y - current.y);
|
||||
const radius = Math.min(64, previousDistance * 0.42, nextDistance * 0.42);
|
||||
if (radius < 2) {
|
||||
path += ` L ${current.x},${current.y}`;
|
||||
last = current;
|
||||
continue;
|
||||
}
|
||||
const entry = {
|
||||
x: current.x + ((previous.x - current.x) / previousDistance) * radius,
|
||||
y: current.y + ((previous.y - current.y) / previousDistance) * radius,
|
||||
};
|
||||
const exit = {
|
||||
x: current.x + ((next.x - current.x) / nextDistance) * radius,
|
||||
y: current.y + ((next.y - current.y) / nextDistance) * radius,
|
||||
};
|
||||
if (Math.abs(entry.x - last.x) > 0.5 || Math.abs(entry.y - last.y) > 0.5) path += ` L ${entry.x},${entry.y}`;
|
||||
path += ` Q ${current.x},${current.y} ${exit.x},${exit.y}`;
|
||||
last = exit;
|
||||
}
|
||||
const end = compact[compact.length - 1];
|
||||
return `${path} L ${end.x},${end.y}`;
|
||||
}
|
||||
|
||||
function pipelineCurvePath(sourceX: number, sourceY: number, targetX: number, targetY: number, targetPosition: Position, laneOffset: number, routeMode = ""): string {
|
||||
const forward = targetX >= sourceX;
|
||||
const horizontalDistance = Math.max(1, Math.abs(targetX - sourceX));
|
||||
const verticalDistance = Math.abs(targetY - sourceY);
|
||||
const handleOffset = Math.max(34, Math.min(118, horizontalDistance * 0.26));
|
||||
const scatter = Math.min(280, Math.abs(laneOffset));
|
||||
const directShortLine = forward && targetPosition === Position.Left && scatter < 4 && verticalDistance < 28 && horizontalDistance < 420;
|
||||
if (directShortLine) return `M ${sourceX},${sourceY} C ${sourceX + handleOffset},${sourceY} ${targetX - handleOffset},${targetY} ${targetX},${targetY}`;
|
||||
const directForwardLeft = forward && targetPosition === Position.Left && (routeMode === "direct-forward-left" || (horizontalDistance <= 260 && verticalDistance <= 210));
|
||||
if (directForwardLeft) {
|
||||
const bend = Math.max(42, Math.min(140, horizontalDistance * 0.48));
|
||||
const fanoutLift = Math.max(-28, Math.min(28, laneOffset * 0.18));
|
||||
return `M ${sourceX},${sourceY} C ${sourceX + bend},${sourceY + fanoutLift} ${targetX - bend},${targetY} ${targetX},${targetY}`;
|
||||
}
|
||||
|
||||
if (forward) {
|
||||
const sourceRailX = sourceX + handleOffset;
|
||||
if (targetPosition === Position.Top || targetPosition === Position.Bottom) {
|
||||
const verticalSign = targetPosition === Position.Top ? -1 : 1;
|
||||
const railY = targetY + verticalSign * (54 + scatter * 0.42);
|
||||
return pipelineSmoothPath([
|
||||
{ x: sourceX, y: sourceY },
|
||||
{ x: sourceRailX, y: sourceY },
|
||||
{ x: sourceRailX + Math.min(120, horizontalDistance * 0.18), y: railY },
|
||||
{ x: targetX, y: railY },
|
||||
{ x: targetX, y: targetY + verticalSign * 34 },
|
||||
{ x: targetX, y: targetY },
|
||||
]);
|
||||
}
|
||||
const targetRailX = targetX - handleOffset;
|
||||
const railY = (sourceY + targetY) / 2 + laneOffset;
|
||||
return pipelineSmoothPath([
|
||||
{ x: sourceX, y: sourceY },
|
||||
{ x: sourceRailX, y: sourceY },
|
||||
{ x: sourceRailX + Math.min(110, horizontalDistance * 0.16), y: railY },
|
||||
{ x: targetRailX - Math.min(90, horizontalDistance * 0.12), y: railY },
|
||||
{ x: targetRailX, y: targetY },
|
||||
{ x: targetX, y: targetY },
|
||||
]);
|
||||
}
|
||||
|
||||
const targetSign = targetPosition === Position.Bottom ? 1 : targetPosition === Position.Top ? -1 : (laneOffset >= 0 ? 1 : -1);
|
||||
const detourX = Math.max(sourceX, targetX) + 92 + Math.min(180, scatter * 0.52);
|
||||
const railY = targetSign < 0
|
||||
? Math.min(sourceY, targetY) - 84 - scatter * 0.62
|
||||
: Math.max(sourceY, targetY) + 84 + scatter * 0.62;
|
||||
if (targetPosition === Position.Top || targetPosition === Position.Bottom) {
|
||||
return pipelineSmoothPath([
|
||||
{ x: sourceX, y: sourceY },
|
||||
{ x: sourceX + handleOffset, y: sourceY },
|
||||
{ x: detourX, y: railY },
|
||||
{ x: targetX, y: railY },
|
||||
{ x: targetX, y: targetY + targetSign * 38 },
|
||||
{ x: targetX, y: targetY },
|
||||
]);
|
||||
}
|
||||
return pipelineSmoothPath([
|
||||
{ x: sourceX, y: sourceY },
|
||||
{ x: sourceX + handleOffset, y: sourceY },
|
||||
{ x: detourX, y: railY },
|
||||
{ x: targetX - handleOffset, y: railY },
|
||||
{ x: targetX - handleOffset, y: targetY },
|
||||
{ x: targetX, y: targetY },
|
||||
]);
|
||||
}
|
||||
|
||||
function PipelineFlowNode({ data }: AnyRecord) {
|
||||
return h("div", { className: "pipeline-flow-node-body" },
|
||||
pipelineInputPorts.map((port) => h(Handle, {
|
||||
@@ -33,7 +153,7 @@ function PipelineFlowNode({ data }: AnyRecord) {
|
||||
type: "target",
|
||||
position: port.position,
|
||||
isConnectable: false,
|
||||
className: `pipeline-flow-handle input ${port.side}`,
|
||||
className: `pipeline-flow-handle input ${port.side} slot-${port.slot || "mid"}`,
|
||||
style: port.style,
|
||||
})),
|
||||
pipelineOutputPorts.map((port) => h(Handle, {
|
||||
@@ -51,47 +171,7 @@ function PipelineFlowNode({ data }: AnyRecord) {
|
||||
|
||||
function PipelineCurveEdge({ id, sourceX, sourceY, targetX, targetY, targetPosition, markerEnd, markerStart, style, data }: AnyRecord) {
|
||||
const laneOffset = Number(data?.laneOffset || 0);
|
||||
const forward = targetX >= sourceX;
|
||||
const distance = Math.max(1, Math.abs(targetX - sourceX));
|
||||
const handleOffset = Math.max(8, Math.min(92, distance * 0.24));
|
||||
let path = "";
|
||||
if (targetPosition === Position.Top || targetPosition === Position.Bottom) {
|
||||
const sourceRailX = sourceX + handleOffset;
|
||||
const verticalSign = targetPosition === Position.Top ? -1 : 1;
|
||||
const railY = targetY + verticalSign * (46 + Math.min(86, Math.abs(laneOffset) * 0.35));
|
||||
path = [
|
||||
`M ${sourceX},${sourceY}`,
|
||||
`C ${sourceRailX},${sourceY} ${sourceRailX},${sourceY} ${sourceRailX},${sourceY}`,
|
||||
`L ${sourceRailX},${railY}`,
|
||||
`L ${targetX},${railY}`,
|
||||
`C ${targetX},${railY} ${targetX},${targetY - verticalSign * 18} ${targetX},${targetY}`,
|
||||
].join(" ");
|
||||
} else if (forward && Math.abs(laneOffset) < 4) {
|
||||
path = [
|
||||
`M ${sourceX},${sourceY}`,
|
||||
`C ${sourceX + handleOffset},${sourceY} ${targetX - handleOffset},${targetY} ${targetX},${targetY}`,
|
||||
].join(" ");
|
||||
} else if (forward) {
|
||||
const sourceRailX = sourceX + handleOffset;
|
||||
const targetRailX = targetX - handleOffset;
|
||||
const railY = (sourceY + targetY) / 2 + laneOffset;
|
||||
path = [
|
||||
`M ${sourceX},${sourceY}`,
|
||||
`C ${sourceRailX},${sourceY} ${sourceRailX},${sourceY} ${sourceRailX},${sourceY}`,
|
||||
`L ${sourceRailX},${railY}`,
|
||||
`L ${targetRailX},${railY}`,
|
||||
`C ${targetRailX},${railY} ${targetRailX},${targetY} ${targetRailX},${targetY}`,
|
||||
`C ${targetRailX},${targetY} ${targetRailX},${targetY} ${targetX},${targetY}`,
|
||||
].join(" ");
|
||||
} else {
|
||||
const detourX = Math.max(sourceX, targetX) + 92 + Math.min(130, Math.abs(laneOffset));
|
||||
const railY = (sourceY + targetY) / 2 + laneOffset;
|
||||
path = [
|
||||
`M ${sourceX},${sourceY}`,
|
||||
`C ${detourX},${sourceY} ${detourX},${railY} ${detourX},${railY}`,
|
||||
`C ${detourX},${targetY} ${targetX - handleOffset},${targetY} ${targetX},${targetY}`,
|
||||
].join(" ");
|
||||
}
|
||||
const path = pipelineCurvePath(sourceX, sourceY, targetX, targetY, targetPosition, laneOffset, String(data?.routeMode || ""));
|
||||
return h(BaseEdge, { id, path, markerEnd, markerStart, style, interactionWidth: 28 });
|
||||
}
|
||||
|
||||
@@ -213,6 +293,68 @@ function pipelineComponentRef(value: any): string {
|
||||
return `${value.componentClass || "--"}/${value.id || "--"}`;
|
||||
}
|
||||
|
||||
function pipelineComponentRefKey(value: any): string {
|
||||
if (!value || typeof value !== "object" || Array.isArray(value)) return "";
|
||||
const componentClass = String(value.componentClass || "").trim();
|
||||
const id = String(value.id || "").trim();
|
||||
return componentClass && id ? `${componentClass}/${id}` : "";
|
||||
}
|
||||
|
||||
function pipelineConfigObject(pipeline: any): AnyRecord {
|
||||
return pipeline?.config && typeof pipeline.config === "object" && !Array.isArray(pipeline.config) ? pipeline.config : {};
|
||||
}
|
||||
|
||||
function pipelineConfigNodes(pipeline: any): AnyRecord[] {
|
||||
const config = pipelineConfigObject(pipeline);
|
||||
const rawNodes = Array.isArray(config.nodes) ? config.nodes : Array.isArray(pipeline?.nodes) ? pipeline.nodes : [];
|
||||
const nodeById: Map<string, AnyRecord> = new Map();
|
||||
for (const node of rawNodes) {
|
||||
const id = String(node?.id || node?.nodeId || "");
|
||||
if (id) nodeById.set(id, { ...node, id });
|
||||
}
|
||||
const edges = pipelineConfigEdges(pipeline);
|
||||
const addMissing = (id: string) => {
|
||||
if (id && !nodeById.has(id)) nodeById.set(id, { id });
|
||||
};
|
||||
for (const batch of pipelineGraphBatches(pipeline)) pipelineRawNodeIds(batch).forEach(addMissing);
|
||||
for (const edge of edges) {
|
||||
addMissing(String(edge?.from || edge?.source || ""));
|
||||
addMissing(String(edge?.to || edge?.target || ""));
|
||||
}
|
||||
return Array.from(nodeById.values());
|
||||
}
|
||||
|
||||
function pipelineConfigEdges(pipeline: any): AnyRecord[] {
|
||||
const config = pipelineConfigObject(pipeline);
|
||||
return Array.isArray(config.edges) ? config.edges : Array.isArray(pipeline?.edges) ? pipeline.edges : [];
|
||||
}
|
||||
|
||||
function pipelineGraphBatches(pipeline: any): any[] {
|
||||
const config = pipelineConfigObject(pipeline);
|
||||
return Array.isArray(config.topologicalBatches) ? config.topologicalBatches : Array.isArray(pipeline?.topologicalBatches) ? pipeline.topologicalBatches : [];
|
||||
}
|
||||
|
||||
function pipelineComponentLookup(components: any[]): Map<string, AnyRecord> {
|
||||
const byRef: Map<string, AnyRecord> = new Map();
|
||||
for (const component of components) {
|
||||
const directKey = pipelineComponentRefKey(component);
|
||||
if (directKey) byRef.set(directKey, component);
|
||||
const refs = Array.isArray(component?.refs) ? component.refs : [];
|
||||
for (const ref of refs) {
|
||||
const refKey = pipelineComponentRefKey(ref);
|
||||
if (refKey) byRef.set(refKey, component);
|
||||
}
|
||||
}
|
||||
return byRef;
|
||||
}
|
||||
|
||||
function pipelineNodeComponent(node: any, componentByRef: Map<string, AnyRecord>): AnyRecord | null {
|
||||
const explicit = componentByRef.get(pipelineComponentRefKey(node?.componentRef));
|
||||
if (explicit) return explicit;
|
||||
const fallbackKey = pipelineComponentRefKey({ componentClass: node?.kind, id: node?.id });
|
||||
return fallbackKey ? componentByRef.get(fallbackKey) || null : null;
|
||||
}
|
||||
|
||||
function pipelineRunNodeStatus(run: any, nodeId: string): string {
|
||||
const nodes = Array.isArray(run?.nodes) ? run.nodes : [];
|
||||
const node = nodes.find((item: any) => item?.nodeId === nodeId || item?.id === nodeId);
|
||||
@@ -243,18 +385,16 @@ function pipelineRawNodeIds(value: any): string[] {
|
||||
return [];
|
||||
}
|
||||
|
||||
function pipelineGraphColumns(pipeline: any): string[][] {
|
||||
const rawBatches = Array.isArray(pipeline?.topologicalBatches) ? pipeline.topologicalBatches : [];
|
||||
function pipelineGraphColumns(pipeline: any, pipelineNodes: AnyRecord[], pipelineEdges: AnyRecord[]): string[][] {
|
||||
const rawBatches = pipelineGraphBatches(pipeline);
|
||||
const explicit = rawBatches.map(pipelineRawNodeIds).filter((batch: string[]) => batch.length > 0);
|
||||
if (explicit.length > 0) return explicit;
|
||||
|
||||
const nodes = Array.isArray(pipeline?.nodes) ? pipeline.nodes : [];
|
||||
const ids: string[] = nodes.map((node: any) => String(node?.id || "")).filter(Boolean);
|
||||
const ids: string[] = pipelineNodes.map((node: any) => String(node?.id || "")).filter(Boolean);
|
||||
const idSet = new Set(ids);
|
||||
const rawEdges = Array.isArray(pipeline?.edges) ? pipeline.edges : [];
|
||||
const incoming: Map<string, number> = new Map(ids.map((id) => [id, 0]));
|
||||
const outgoing: Map<string, string[]> = new Map(ids.map((id) => [id, []]));
|
||||
for (const edge of rawEdges) {
|
||||
for (const edge of pipelineEdges) {
|
||||
const from = String(edge?.from || edge?.source || "");
|
||||
const to = String(edge?.to || edge?.target || "");
|
||||
if (!idSet.has(from) || !idSet.has(to)) continue;
|
||||
@@ -282,10 +422,12 @@ function pipelineFlowEdgeKey(edge: AnyRecord): string {
|
||||
return `${edge.source}->${edge.target}-${edge.index}`;
|
||||
}
|
||||
|
||||
function pipelineFlowElements(pipeline: any, latestRun: any): { nodes: Node[]; edges: Edge[] } {
|
||||
const pipelineNodes = Array.isArray(pipeline?.nodes) ? pipeline.nodes : [];
|
||||
function pipelineFlowElements(pipeline: any, latestRun: any, components: any[]): { nodes: Node[]; edges: Edge[] } {
|
||||
const pipelineNodes = pipelineConfigNodes(pipeline);
|
||||
const pipelineEdges = pipelineConfigEdges(pipeline);
|
||||
const componentByRef = pipelineComponentLookup(components);
|
||||
const nodeById: Map<string, AnyRecord> = new Map(pipelineNodes.map((node: any) => [String(node?.id || ""), node]));
|
||||
const columns = pipelineGraphColumns(pipeline);
|
||||
const columns = pipelineGraphColumns(pipeline, pipelineNodes, pipelineEdges);
|
||||
const flowNodes: Node[] = [];
|
||||
const nodeLayout: Map<string, { column: number; row: number; y: number }> = new Map();
|
||||
const columnGap = 330;
|
||||
@@ -294,8 +436,12 @@ function pipelineFlowElements(pipeline: any, latestRun: any): { nodes: Node[]; e
|
||||
const columnHeight = batch.length * rowGap;
|
||||
batch.forEach((nodeId, rowIndex) => {
|
||||
const node = nodeById.get(nodeId) || { id: nodeId };
|
||||
const component = pipelineNodeComponent(node, componentByRef);
|
||||
const status = pipelineRunNodeStatus(latestRun, nodeId).toLowerCase();
|
||||
const kind = String(node.kind || "node").toLowerCase();
|
||||
const kind = String(node.kind || component?.componentClass || "node").toLowerCase();
|
||||
const componentRef = pipelineComponentRef(node.componentRef || component);
|
||||
const componentVersion = String(component?.config?.version || component?.version || "");
|
||||
const componentDescription = String(component?.config?.description || component?.description || "");
|
||||
const y = rowIndex * rowGap - Math.floor(columnHeight / 2);
|
||||
nodeLayout.set(nodeId, { column: columnIndex, row: rowIndex, y });
|
||||
flowNodes.push({
|
||||
@@ -306,10 +452,18 @@ function pipelineFlowElements(pipeline: any, latestRun: any): { nodes: Node[]; e
|
||||
y,
|
||||
},
|
||||
data: {
|
||||
exportLabel: {
|
||||
id: nodeId,
|
||||
kind,
|
||||
componentRef,
|
||||
componentVersion,
|
||||
componentDescription,
|
||||
status,
|
||||
},
|
||||
label: h("div", { className: "flow-node-label" },
|
||||
h("strong", null, nodeId),
|
||||
h("span", null, kind),
|
||||
h("code", null, pipelineComponentRef(node.componentRef)),
|
||||
h("code", { title: componentDescription || componentRef }, componentVersion ? `${componentRef}@${componentVersion}` : componentRef),
|
||||
h(StatusBadge, { status }, status),
|
||||
),
|
||||
},
|
||||
@@ -317,11 +471,11 @@ function pipelineFlowElements(pipeline: any, latestRun: any): { nodes: Node[]; e
|
||||
});
|
||||
});
|
||||
});
|
||||
const graphEdges = (Array.isArray(pipeline?.edges) ? pipeline.edges : []).flatMap((edge: any, index: number) => {
|
||||
const graphEdges = pipelineEdges.flatMap((edge: any, index: number) => {
|
||||
const source = String(edge?.from || edge?.source || "");
|
||||
const target = String(edge?.to || edge?.target || "");
|
||||
if (!nodeById.has(source) || !nodeById.has(target)) return [];
|
||||
return [{ source, target, index }];
|
||||
return [{ source, target, index, condition: edge?.condition, edgeType: edge?.edgeType }];
|
||||
});
|
||||
const sourceTotals = graphEdges.reduce((memo: Map<string, number>, edge: AnyRecord) => memo.set(edge.source, (memo.get(edge.source) || 0) + 1), new Map<string, number>());
|
||||
const targetTotals = graphEdges.reduce((memo: Map<string, number>, edge: AnyRecord) => memo.set(edge.target, (memo.get(edge.target) || 0) + 1), new Map<string, number>());
|
||||
@@ -334,6 +488,32 @@ function pipelineFlowElements(pipeline: any, latestRun: any): { nodes: Node[]; e
|
||||
const pairSeen = new Map<string, number>();
|
||||
const targetHandleByEdge = new Map<string, AnyRecord>();
|
||||
const targetPortLoads: Map<string, Map<string, number>> = new Map();
|
||||
const adjacentForwardFanOutByEdge = new Map<string, { slot: number; count: number }>();
|
||||
const adjacentForwardClusters = graphEdges.reduce((memo: Map<string, AnyRecord[]>, edge: AnyRecord) => {
|
||||
const sourceLayout = nodeLayout.get(edge.source);
|
||||
const targetLayout = nodeLayout.get(edge.target);
|
||||
const rawSpan = (targetLayout?.column || 0) - (sourceLayout?.column || 0);
|
||||
const backward = rawSpan <= 0 || String(edge.edgeType || "").toLowerCase() === "rework";
|
||||
if (backward || rawSpan !== 1) return memo;
|
||||
const key = `${edge.source}->column:${targetLayout?.column ?? ""}`;
|
||||
const list = memo.get(key) || [];
|
||||
list.push(edge);
|
||||
memo.set(key, list);
|
||||
return memo;
|
||||
}, new Map<string, AnyRecord[]>());
|
||||
for (const cluster of adjacentForwardClusters.values()) {
|
||||
if (cluster.length < 2) continue;
|
||||
cluster
|
||||
.slice()
|
||||
.sort((left: AnyRecord, right: AnyRecord) => {
|
||||
const leftTarget = nodeLayout.get(left.target);
|
||||
const rightTarget = nodeLayout.get(right.target);
|
||||
return (leftTarget?.y || 0) - (rightTarget?.y || 0) || left.index - right.index;
|
||||
})
|
||||
.forEach((edge: AnyRecord, index: number, ordered: AnyRecord[]) => {
|
||||
adjacentForwardFanOutByEdge.set(pipelineFlowEdgeKey(edge), { slot: index - (ordered.length - 1) / 2, count: ordered.length });
|
||||
});
|
||||
}
|
||||
const sortedIncomingEdges = [...graphEdges].sort((left: AnyRecord, right: AnyRecord) => {
|
||||
const leftSource = nodeLayout.get(left.source);
|
||||
const leftTarget = nodeLayout.get(left.target);
|
||||
@@ -346,16 +526,31 @@ function pipelineFlowElements(pipeline: any, latestRun: any): { nodes: Node[]; e
|
||||
sortedIncomingEdges.forEach((edge: AnyRecord) => {
|
||||
const sourceLayout = nodeLayout.get(edge.source) || { column: 0, row: 0, y: 0 };
|
||||
const targetLayout = nodeLayout.get(edge.target) || { column: 0, row: 0, y: 0 };
|
||||
const span = Math.max(0, targetLayout.column - sourceLayout.column);
|
||||
const rawSpan = targetLayout.column - sourceLayout.column;
|
||||
const span = Math.max(0, rawSpan);
|
||||
const backward = rawSpan <= 0 || String(edge.edgeType || "").toLowerCase() === "rework";
|
||||
const verticalDelta = sourceLayout.y - targetLayout.y;
|
||||
const targetIncomingCount = targetTotals.get(edge.target) || 1;
|
||||
const adjacentFanOut = adjacentForwardFanOutByEdge.has(pipelineFlowEdgeKey(edge));
|
||||
const primaryForwardInput = !backward && span <= 1 && (adjacentFanOut || targetIncomingCount === 1);
|
||||
const loads = targetPortLoads.get(edge.target) || new Map<string, number>();
|
||||
targetPortLoads.set(edge.target, loads);
|
||||
const port = pipelineInputPorts.slice().sort((left, right) => {
|
||||
const loadDiff = (loads.get(left.id) || 0) - (loads.get(right.id) || 0);
|
||||
if (loadDiff !== 0) return loadDiff;
|
||||
const score = (item: AnyRecord): number => {
|
||||
const side = String(item.side);
|
||||
let value = 0;
|
||||
if (backward) {
|
||||
if (side === "left") value += 86;
|
||||
if (side === "top") value += targetLayout.y <= 0 ? -22 : 12;
|
||||
if (side === "bottom") value += targetLayout.y >= 0 ? -22 : 12;
|
||||
if (Math.abs(targetLayout.y) < 12 && side !== "left") value += edge.index % 2 === 0 ? (side === "top" ? -6 : 6) : (side === "bottom" ? -6 : 6);
|
||||
return value;
|
||||
}
|
||||
if (primaryForwardInput) {
|
||||
if (side === "left") value -= adjacentFanOut ? 72 : 44;
|
||||
if (side !== "left") value += adjacentFanOut ? 72 : 44;
|
||||
return value + Math.abs(verticalDelta) * 0.02;
|
||||
}
|
||||
if (side === "left") value += span <= 1 ? 0 : 24;
|
||||
if (side === "top") value += verticalDelta < -36 ? -18 : 42;
|
||||
if (side === "bottom") value += verticalDelta > 36 ? -18 : 42;
|
||||
@@ -363,12 +558,18 @@ function pipelineFlowElements(pipeline: any, latestRun: any): { nodes: Node[]; e
|
||||
if (span > 1 && side !== "left") value -= 10;
|
||||
return value;
|
||||
};
|
||||
return score(left) - score(right);
|
||||
const rawLaneHint = sourceLayout.y - targetLayout.y;
|
||||
const laneHint = rawLaneHint !== 0 ? rawLaneHint : (edge.index % 2 === 0 ? -1 : 1);
|
||||
const totalScore = (item: AnyRecord): number => {
|
||||
const load = loads.get(item.id) || 0;
|
||||
return score(item) + load * 64 + pipelineEndpointScore(item, loads, laneHint);
|
||||
};
|
||||
return totalScore(left) - totalScore(right) || String(left.id).localeCompare(String(right.id));
|
||||
})[0];
|
||||
loads.set(port.id, (loads.get(port.id) || 0) + 1);
|
||||
targetHandleByEdge.set(pipelineFlowEdgeKey(edge), port);
|
||||
});
|
||||
const edges: Edge[] = graphEdges.map((edge: AnyRecord) => {
|
||||
const edgeDrafts: AnyRecord[] = graphEdges.map((edge: AnyRecord) => {
|
||||
const targetStatus = pipelineRunNodeStatus(latestRun, edge.target).toLowerCase();
|
||||
const pairKey = `${edge.source}->${edge.target}`;
|
||||
const sourceSlot = sourceSeen.get(edge.source) || 0;
|
||||
@@ -382,16 +583,35 @@ function pipelineFlowElements(pipeline: any, latestRun: any): { nodes: Node[]; e
|
||||
const pairLane = pairSlot - ((pairTotals.get(pairKey) || 1) - 1) / 2;
|
||||
const sourceLayout = nodeLayout.get(edge.source);
|
||||
const targetLayout = nodeLayout.get(edge.target);
|
||||
const span = Math.max(1, (targetLayout?.column || 0) - (sourceLayout?.column || 0));
|
||||
const rawSpan = (targetLayout?.column || 0) - (sourceLayout?.column || 0);
|
||||
const span = Math.max(1, Math.abs(rawSpan));
|
||||
const backward = rawSpan <= 0 || String(edge.edgeType || "").toLowerCase() === "rework";
|
||||
const verticalDelta = Math.abs((targetLayout?.y || 0) - (sourceLayout?.y || 0));
|
||||
const lane = pairLane * 2 + sourceLane + targetLane * 0.45;
|
||||
const fanOutMeta = adjacentForwardFanOutByEdge.get(pipelineFlowEdgeKey(edge));
|
||||
const adjacentFanIn = !backward && rawSpan === 1 && (targetTotals.get(edge.target) || 0) > 1;
|
||||
const lane = fanOutMeta ? fanOutMeta.slot : pairLane * 2 + sourceLane + targetLane * 0.45;
|
||||
const railDirection = lane === 0 ? (edge.index % 2 === 0 ? -1 : 1) : Math.sign(lane);
|
||||
const targetPort = targetHandleByEdge.get(pipelineFlowEdgeKey(edge)) || pipelineInputPorts[1];
|
||||
const shouldScatter = span > 1 || verticalDelta > 96 || Math.abs(lane) > 0.2 || targetPort.side !== "left";
|
||||
const laneOffset = shouldScatter ? Math.max(-220, Math.min(220, lane * 34 + railDirection * Math.min(92, 22 + span * 16))) : 0;
|
||||
const portDirection = targetPort.side === "top" ? -1 : targetPort.side === "bottom" ? 1 : railDirection;
|
||||
const shouldScatter = backward || span > 1 || verticalDelta > 96 || Math.abs(lane) > 0.2 || targetPort.side !== "left";
|
||||
const baseScatter = backward ? 118 + span * 18 : 22 + span * 16;
|
||||
const sideScatter = targetPort.side === "left" ? 0 : 28;
|
||||
const laneOffset = shouldScatter ? Math.max(-280, Math.min(280, portDirection * Math.min(180, baseScatter + sideScatter + verticalDelta * 0.22) + lane * 28)) : 0;
|
||||
const sourceHandleIndex = Math.max(0, Math.min(pipelineOutputPorts.length - 1, Math.round(sourceLane + (pipelineOutputPorts.length - 1) / 2)));
|
||||
const sourcePort = pipelineOutputPorts[sourceHandleIndex] || pipelineOutputPorts[1];
|
||||
const edgeColor = targetStatus === "succeeded" ? "var(--accent-2)" : targetStatus === "running" ? "var(--accent)" : targetStatus === "failed" ? "var(--danger)" : "rgba(129, 147, 159, 0.78)";
|
||||
const baseEdgeColor = targetStatus === "succeeded" ? "var(--accent-2)" : targetStatus === "running" ? "var(--accent)" : targetStatus === "failed" ? "var(--danger)" : "rgba(129, 147, 159, 0.78)";
|
||||
const sourceColumn = sourceLayout?.column || 0;
|
||||
const targetColumn = targetLayout?.column || 0;
|
||||
const railSign = laneOffset === 0 ? 0 : Math.sign(laneOffset);
|
||||
const overlapGroup = backward
|
||||
? `feedback:${sourceColumn}->${targetColumn}:${railSign}`
|
||||
: fanOutMeta
|
||||
? `fanout:${sourceColumn}->${targetColumn}:${edge.source}`
|
||||
: adjacentFanIn
|
||||
? `fanin:${sourceColumn}->${targetColumn}:${edge.target}`
|
||||
: targetPort.side !== "left" || span > 1
|
||||
? `corridor:${sourceColumn}->${targetColumn}:${targetPort.side}:${railSign}:${Math.round(Math.abs(laneOffset) / 56)}`
|
||||
: "";
|
||||
return {
|
||||
id: `${edge.source}->${edge.target}-${edge.index}`,
|
||||
source: edge.source,
|
||||
@@ -401,18 +621,178 @@ function pipelineFlowElements(pipeline: any, latestRun: any): { nodes: Node[]; e
|
||||
type: "pipelineCurve",
|
||||
zIndex: 12,
|
||||
animated: targetStatus === "running",
|
||||
data: { laneOffset, targetSide: targetPort.side },
|
||||
style: { stroke: edgeColor },
|
||||
markerEnd: { type: MarkerType.ArrowClosed, color: edgeColor },
|
||||
className: `pipeline-flow-edge ${targetStatus}`,
|
||||
data: { baseEdgeColor, laneOffset, routeMode: fanOutMeta && targetPort.side === "left" ? "direct-forward-left" : "", targetSide: targetPort.side, isFeedback: backward, overlapGroup },
|
||||
targetStatus,
|
||||
};
|
||||
});
|
||||
const overlapGroupCounts = edgeDrafts.reduce((memo: Map<string, number>, edge: AnyRecord) => {
|
||||
const key = String(edge.data?.overlapGroup || "");
|
||||
return key ? memo.set(key, (memo.get(key) || 0) + 1) : memo;
|
||||
}, new Map<string, number>());
|
||||
const overlapGroupSeen = new Map<string, number>();
|
||||
const edges: Edge[] = edgeDrafts.map((edge: AnyRecord) => {
|
||||
const draftTargetStatus = String(edge.targetStatus || "pending");
|
||||
const flowEdge: AnyRecord = { ...edge };
|
||||
delete flowEdge.targetStatus;
|
||||
const groupKey = String(edge.data?.overlapGroup || "");
|
||||
const groupCount = groupKey ? overlapGroupCounts.get(groupKey) || 0 : 0;
|
||||
const overlapSlot = groupCount > 1 ? overlapGroupSeen.get(groupKey) || 0 : -1;
|
||||
if (groupCount > 1) overlapGroupSeen.set(groupKey, overlapSlot + 1);
|
||||
const edgeColor = overlapSlot >= 0 ? pipelineOverlapEdgePalette[overlapSlot % pipelineOverlapEdgePalette.length] : String(edge.data.baseEdgeColor);
|
||||
const style: AnyRecord = { stroke: edgeColor };
|
||||
if (edge.data.isFeedback) style.strokeDasharray = "9 7";
|
||||
return {
|
||||
...flowEdge,
|
||||
data: { ...edge.data, edgeColor, overlapSlot, overlapCount: groupCount },
|
||||
style,
|
||||
markerEnd: { type: MarkerType.ArrowClosed, color: edgeColor },
|
||||
className: `pipeline-flow-edge ${draftTargetStatus} ${edge.data.isFeedback ? "feedback" : ""} ${overlapSlot >= 0 ? "overlap-colored" : ""}`,
|
||||
} as Edge;
|
||||
});
|
||||
return { nodes: flowNodes, edges };
|
||||
}
|
||||
|
||||
function escapeSvg(value: any): string {
|
||||
return String(value ?? "")
|
||||
.replace(/&/g, "&")
|
||||
.replace(/</g, "<")
|
||||
.replace(/>/g, ">")
|
||||
.replace(/"/g, """);
|
||||
}
|
||||
|
||||
function exportColor(value: any): string {
|
||||
const text = String(value || "");
|
||||
if (text.includes("--accent-2")) return "#4eb7a8";
|
||||
if (text.includes("--accent")) return "#d7a13a";
|
||||
if (text.includes("--danger")) return "#cf6a54";
|
||||
return text.startsWith("#") ? text : "#81939f";
|
||||
}
|
||||
|
||||
function exportMarkerId(color: string): string {
|
||||
return `arrow-${color.replace(/[^a-zA-Z0-9_-]+/g, "")}`;
|
||||
}
|
||||
|
||||
function targetPortPosition(node: Node, handle: string): { x: number; y: number; position: Position } {
|
||||
const x = node.position.x;
|
||||
const y = node.position.y;
|
||||
const port = pipelineInputPorts.find((item) => item.id === handle);
|
||||
if (port?.side === "top") return { x: x + pipelineNodeWidth * pipelinePercent(port.style?.left, 0.5), y, position: Position.Top };
|
||||
if (port?.side === "bottom") return { x: x + pipelineNodeWidth * pipelinePercent(port.style?.left, 0.5), y: y + pipelineNodeHeight, position: Position.Bottom };
|
||||
return { x, y: y + pipelineNodeHeight / 2, position: Position.Left };
|
||||
}
|
||||
|
||||
function sourcePortPosition(node: Node): { x: number; y: number } {
|
||||
return { x: node.position.x + pipelineNodeWidth, y: node.position.y + pipelineNodeHeight / 2 };
|
||||
}
|
||||
|
||||
function pipelineGraphSvg(flow: { nodes: Node[]; edges: Edge[] }, title: string): { svg: string; width: number; height: number } {
|
||||
const minX = Math.min(...flow.nodes.map((node) => node.position.x), 0) - 220;
|
||||
const minY = Math.min(...flow.nodes.map((node) => node.position.y), 0) - 220;
|
||||
const maxX = Math.max(...flow.nodes.map((node) => node.position.x + pipelineNodeWidth), 1) + 220;
|
||||
const maxY = Math.max(...flow.nodes.map((node) => node.position.y + pipelineNodeHeight), 1) + 220;
|
||||
const width = Math.ceil(maxX - minX);
|
||||
const height = Math.ceil(maxY - minY);
|
||||
const nodeById = new Map(flow.nodes.map((node) => [node.id, node]));
|
||||
const edgeColors = flow.edges.map((edge) => exportColor((edge.data as AnyRecord)?.edgeColor || (edge.style as AnyRecord)?.stroke));
|
||||
const markerColors = Array.from(new Set(["#4eb7a8", "#d7a13a", "#cf6a54", "#81939f", ...edgeColors]));
|
||||
const markers = markerColors.map((color) =>
|
||||
`<marker id="${exportMarkerId(color)}" viewBox="0 0 10 10" refX="9" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse"><path d="M 0 0 L 10 5 L 0 10 z" fill="${color}"/></marker>`,
|
||||
).join("");
|
||||
const edgeSvg = flow.edges.flatMap((edge) => {
|
||||
const source = nodeById.get(edge.source);
|
||||
const target = nodeById.get(edge.target);
|
||||
if (!source || !target) return [];
|
||||
const sourcePoint = sourcePortPosition(source);
|
||||
const targetPoint = targetPortPosition(target, String(edge.targetHandle || "in-left"));
|
||||
const path = pipelineCurvePath(sourcePoint.x, sourcePoint.y, targetPoint.x, targetPoint.y, targetPoint.position, Number((edge.data as AnyRecord)?.laneOffset || 0), String((edge.data as AnyRecord)?.routeMode || ""));
|
||||
const color = exportColor((edge.data as AnyRecord)?.edgeColor || (edge.style as AnyRecord)?.stroke);
|
||||
const dash = (edge.data as AnyRecord)?.isFeedback ? ` stroke-dasharray="9 7"` : "";
|
||||
return `<path d="${escapeSvg(path)}" fill="none" stroke="${color}" stroke-width="2.35" stroke-linecap="round" stroke-linejoin="round" opacity="0.82"${dash} marker-end="url(#${exportMarkerId(color)})"/>`;
|
||||
}).join("\n");
|
||||
const nodeSvg = flow.nodes.map((node) => {
|
||||
const label = ((node.data as AnyRecord)?.exportLabel || {}) as AnyRecord;
|
||||
const status = String(label.status || "pending").toLowerCase();
|
||||
const stroke = status === "succeeded" ? "#4eb7a8" : status === "running" ? "#d7a13a" : status === "failed" ? "#cf6a54" : "#81939f";
|
||||
const x = node.position.x;
|
||||
const y = node.position.y;
|
||||
const inputHandles = pipelineInputPorts.map((port) => {
|
||||
const point = targetPortPosition(node, port.id);
|
||||
if (port.side === "top" || port.side === "bottom") return `<rect x="${point.x - 9}" y="${point.y - 5}" width="18" height="10" rx="2" fill="#071016" stroke="#4eb7a8"/>`;
|
||||
return `<rect x="${point.x - 5}" y="${point.y - 9}" width="10" height="18" rx="2" fill="#071016" stroke="#4eb7a8"/>`;
|
||||
}).join("\n");
|
||||
return `<g>
|
||||
<rect x="${x}" y="${y}" width="${pipelineNodeWidth}" height="${pipelineNodeHeight}" rx="2" fill="#0d171f" stroke="${stroke}" stroke-width="1.2"/>
|
||||
${inputHandles}
|
||||
<rect x="${x + pipelineNodeWidth - 5}" y="${y + pipelineNodeHeight / 2 - 9}" width="10" height="18" rx="2" fill="#071016" stroke="#d7a13a"/>
|
||||
<text x="${x + 12}" y="${y + 22}" fill="#e5edf2" font-size="12" font-family="monospace" font-weight="700">${escapeSvg(label.id || node.id)}</text>
|
||||
<text x="${x + 12}" y="${y + 42}" fill="#81939f" font-size="10" font-family="monospace">${escapeSvg(label.kind || "node")}</text>
|
||||
<text x="${x + 12}" y="${y + 60}" fill="#d7a13a" font-size="10" font-family="monospace">${escapeSvg(label.componentRef || "--")}</text>
|
||||
<text x="${x + 12}" y="${y + 78}" fill="${stroke}" font-size="10" font-family="monospace">${escapeSvg(status)}</text>
|
||||
</g>`;
|
||||
}).join("\n");
|
||||
const svg = `<svg xmlns="http://www.w3.org/2000/svg" width="${width}" height="${height}" viewBox="0 0 ${width} ${height}">
|
||||
<defs>${markers}<pattern id="grid" width="22" height="22" patternUnits="userSpaceOnUse"><path d="M 22 0 L 0 0 0 22" fill="none" stroke="#18303a" stroke-width="0.6"/></pattern></defs>
|
||||
<rect width="100%" height="100%" fill="#081118"/>
|
||||
<rect width="100%" height="100%" fill="url(#grid)" opacity="0.55"/>
|
||||
<text x="24" y="34" fill="#d7a13a" font-size="13" font-family="monospace" letter-spacing="2">${escapeSvg(title)}</text>
|
||||
<g transform="translate(${-minX} ${-minY})">${nodeSvg}<g>${edgeSvg}</g></g>
|
||||
</svg>`;
|
||||
return { svg, width, height };
|
||||
}
|
||||
|
||||
function downloadBlob(blob: Blob, filename: string): void {
|
||||
const url = URL.createObjectURL(blob);
|
||||
const link = document.createElement("a");
|
||||
link.href = url;
|
||||
link.download = filename;
|
||||
link.click();
|
||||
setTimeout(() => URL.revokeObjectURL(url), 1000);
|
||||
}
|
||||
|
||||
async function exportPipelineGraph(flow: { nodes: Node[]; edges: Edge[] }, title: string): Promise<void> {
|
||||
const safeTitle = String(title || "pipeline").replace(/[^a-zA-Z0-9_-]+/g, "-").replace(/^-|-$/g, "") || "pipeline";
|
||||
const { svg, width, height } = pipelineGraphSvg(flow, title);
|
||||
const svgBlob = new Blob([svg], { type: "image/svg+xml;charset=utf-8" });
|
||||
const url = URL.createObjectURL(svgBlob);
|
||||
try {
|
||||
const image = new Image();
|
||||
await new Promise<void>((resolve, reject) => {
|
||||
image.onload = () => resolve();
|
||||
image.onerror = () => reject(new Error("svg image load failed"));
|
||||
image.src = url;
|
||||
});
|
||||
const canvas = document.createElement("canvas");
|
||||
canvas.width = width;
|
||||
canvas.height = height;
|
||||
const ctx = canvas.getContext("2d");
|
||||
if (!ctx) throw new Error("canvas unavailable");
|
||||
ctx.drawImage(image, 0, 0);
|
||||
const pngBlob = await new Promise<Blob | null>((resolve) => canvas.toBlob(resolve, "image/png"));
|
||||
if (!pngBlob) throw new Error("png export failed");
|
||||
downloadBlob(pngBlob, `${safeTitle}.png`);
|
||||
} catch {
|
||||
downloadBlob(svgBlob, `${safeTitle}.svg`);
|
||||
} finally {
|
||||
URL.revokeObjectURL(url);
|
||||
}
|
||||
}
|
||||
|
||||
async function exportPipelineGraphs(items: Array<{ flow: { nodes: Node[]; edges: Edge[] }; title: string }>): Promise<void> {
|
||||
for (const item of items) {
|
||||
if (item.flow.nodes.length === 0) continue;
|
||||
await exportPipelineGraph(item.flow, item.title);
|
||||
await new Promise((resolve) => setTimeout(resolve, 750));
|
||||
}
|
||||
}
|
||||
|
||||
function pipelineLatestRun(runs: any[], pipelineId: string): any {
|
||||
return runs.find((run) => String(run?.pipelineId || "") === pipelineId) || null;
|
||||
}
|
||||
|
||||
export function PipelinePage({ microservices, onRaw, apiBaseUrl = "/api" }: AnyRecord) {
|
||||
const service = microservices.find((item: any) => item.id === "pipeline") || null;
|
||||
const [state, setState] = useState({ loading: false, error: "", health: null, snapshot: null, refreshedAt: null });
|
||||
const [selectedPipelineId, setSelectedPipelineId] = useState("");
|
||||
|
||||
async function load(): Promise<void> {
|
||||
if (!service) return;
|
||||
@@ -420,7 +800,7 @@ export function PipelinePage({ microservices, onRaw, apiBaseUrl = "/api" }: AnyR
|
||||
try {
|
||||
const [health, snapshot] = await Promise.all([
|
||||
requestJson(`${apiBaseUrl}/microservices/pipeline/health`),
|
||||
requestJson(`${apiBaseUrl}/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:8,runs:3`),
|
||||
requestJson(`${apiBaseUrl}/microservices/pipeline/proxy/api/snapshot?__unideskArrayLimit=registry.components:80,runs:3`),
|
||||
]);
|
||||
setState({ loading: false, error: "", health, snapshot, refreshedAt: new Date() });
|
||||
} catch (err) {
|
||||
@@ -439,15 +819,24 @@ export function PipelinePage({ microservices, onRaw, apiBaseUrl = "/api" }: AnyR
|
||||
const backend = microserviceBackend(service);
|
||||
const snapshot = state.snapshot || {};
|
||||
const { components, pipelines, runs } = pipelineSnapshotArrays(snapshot);
|
||||
const activePipeline = pipelines[0] || {};
|
||||
const pipelineNodes = Array.isArray(activePipeline.nodes) ? activePipeline.nodes : [];
|
||||
const pipelineEdges = Array.isArray(activePipeline.edges) ? activePipeline.edges : [];
|
||||
const latestRun = runs[0] || null;
|
||||
const activePipeline = pipelines.find((pipeline: any) => String(pipeline.id || "") === selectedPipelineId) || pipelines[0] || {};
|
||||
const activePipelineId = String(activePipeline.id || "");
|
||||
const pipelineNodes = pipelineConfigNodes(activePipeline);
|
||||
const pipelineEdges = pipelineConfigEdges(activePipeline);
|
||||
const latestRun = pipelineLatestRun(runs, activePipelineId);
|
||||
const statusCounts = pipelineStatusCounts(runs);
|
||||
const componentClasses = pipelineComponentClassCounts(components);
|
||||
const componentCount = Number(state.health?.components) || pipelineArrayCount(snapshot, "registry.components", components.length);
|
||||
const runCount = pipelineArrayCount(snapshot, "runs", runs.length);
|
||||
const flow = pipelineFlowElements(activePipeline, latestRun);
|
||||
const flow = pipelineFlowElements(activePipeline, latestRun, components);
|
||||
const exportItems = pipelines.map((pipeline: any) => {
|
||||
const pipelineId = String(pipeline.id || "pipeline");
|
||||
const run = pipelineLatestRun(runs, pipelineId);
|
||||
return {
|
||||
title: `${pipelineId}-${run?.runId || "snapshot"}`,
|
||||
flow: pipelineFlowElements(pipeline, run, components),
|
||||
};
|
||||
});
|
||||
return h("div", { className: "pipeline-page", "data-testid": "pipeline-page" },
|
||||
h(Panel, {
|
||||
title: "Pipeline v2 工作台",
|
||||
@@ -501,8 +890,33 @@ export function PipelinePage({ microservices, onRaw, apiBaseUrl = "/api" }: AnyR
|
||||
components.slice(0, 12).map((component: any) => h("span", { key: component.key, className: "data-chip" }, h("b", null, component.componentClass || "--"), h("span", null, component.id || component.key || "--"))),
|
||||
),
|
||||
),
|
||||
h(Panel, { title: "控制图", eyebrow: `${activePipeline.id || "pipeline"} / latest run ${latestRun?.status || "--"}` },
|
||||
pipelineNodes.length === 0 ? h(EmptyState, { title: "暂无控制图", text: "等待 D601 pipeline backend 返回 pipeline.nodes" }) :
|
||||
h(Panel, {
|
||||
title: "控制图",
|
||||
eyebrow: `${activePipeline.id || "pipeline"} / latest run ${latestRun?.status || "--"}`,
|
||||
actions: h("div", { className: "pipeline-toolbar" },
|
||||
h("select", {
|
||||
value: activePipelineId,
|
||||
disabled: pipelines.length === 0,
|
||||
onChange: (event: any) => setSelectedPipelineId(event.target.value),
|
||||
"data-testid": "pipeline-select",
|
||||
}, pipelines.map((pipeline: any) => h("option", { key: pipeline.id, value: pipeline.id }, pipeline.id || pipeline.key))),
|
||||
h("button", {
|
||||
type: "button",
|
||||
className: "ghost-btn",
|
||||
disabled: flow.nodes.length === 0,
|
||||
onClick: () => exportPipelineGraph(flow, `${activePipeline.id || "pipeline"}-${latestRun?.runId || "snapshot"}`),
|
||||
"data-testid": "pipeline-export-graph",
|
||||
}, "导出渲染图"),
|
||||
h("button", {
|
||||
type: "button",
|
||||
className: "ghost-btn",
|
||||
disabled: exportItems.every((item) => item.flow.nodes.length === 0),
|
||||
onClick: () => exportPipelineGraphs(exportItems),
|
||||
"data-testid": "pipeline-export-all-graphs",
|
||||
}, "批量导出"),
|
||||
),
|
||||
},
|
||||
pipelineNodes.length === 0 ? h(EmptyState, { title: "暂无控制图", text: "等待 D601 pipeline backend 返回 config.nodes / config.edges" }) :
|
||||
h("div", { className: "pipeline-flow-frame", "data-testid": "pipeline-react-flow" },
|
||||
h(ReactFlow, {
|
||||
nodes: flow.nodes,
|
||||
@@ -526,6 +940,8 @@ export function PipelinePage({ microservices, onRaw, apiBaseUrl = "/api" }: AnyR
|
||||
h("div", { className: "pipeline-flow-summary" },
|
||||
h("span", null, `${flow.nodes.length} nodes`),
|
||||
h("span", null, `${flow.edges.length} edges`),
|
||||
h("span", null, `${pipelines.length} pipelines`),
|
||||
h("span", null, `source config+components(${components.length})`),
|
||||
h("span", null, `latest ${latestRun?.runId || "--"}`),
|
||||
),
|
||||
),
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
{
|
||||
"name": "@unidesk/provider-gateway",
|
||||
"version": "0.2.1",
|
||||
"version": "0.2.5",
|
||||
"private": true,
|
||||
"type": "module",
|
||||
"scripts": {
|
||||
|
||||
@@ -58,6 +58,8 @@ let systemStatusRunning = false;
|
||||
let previousCpuSample: { idle: number; total: number } | null = null;
|
||||
let reconnectAttempt = 0;
|
||||
let stopping = false;
|
||||
let upgradeSleepUntil = 0;
|
||||
let upgradeSleepTimer: ReturnType<typeof setTimeout> | null = null;
|
||||
|
||||
interface HostSshSession {
|
||||
proc: ReturnType<typeof Bun.spawn>;
|
||||
@@ -72,19 +74,29 @@ interface HostSshStdin {
|
||||
const hostSshSessions = new Map<string, HostSshSession>();
|
||||
const gatewayMetadata = readGatewayMetadata();
|
||||
|
||||
function readGatewayMetadata(): { name: string; version: string } {
|
||||
function readGatewayMetadataFile(path: string): { name: string; version: string } | null {
|
||||
try {
|
||||
const raw = readFileSync(new URL("../package.json", import.meta.url), "utf8");
|
||||
const raw = readFileSync(path, "utf8");
|
||||
const parsed = JSON.parse(raw) as { name?: unknown; version?: unknown };
|
||||
return {
|
||||
name: typeof parsed.name === "string" && parsed.name.length > 0 ? parsed.name : "@unidesk/provider-gateway",
|
||||
version: typeof parsed.version === "string" && parsed.version.length > 0 ? parsed.version : "0.0.0-unknown",
|
||||
};
|
||||
} catch {
|
||||
return { name: "@unidesk/provider-gateway", version: "0.0.0-unknown" };
|
||||
return null;
|
||||
}
|
||||
}
|
||||
|
||||
function readGatewayMetadata(): { name: string; version: string } {
|
||||
return readGatewayMetadataFile(new URL("../package.json", import.meta.url).pathname)
|
||||
?? { name: "@unidesk/provider-gateway", version: "0.0.0-unknown" };
|
||||
}
|
||||
|
||||
function readTargetGatewayMetadata(workspace: string): { name: string; version: string } {
|
||||
const root = workspace.replace(/\/+$/, "");
|
||||
return readGatewayMetadataFile(`${root}/src/components/provider-gateway/package.json`) ?? gatewayMetadata;
|
||||
}
|
||||
|
||||
function requiredEnv(name: string): string {
|
||||
const value = process.env[name];
|
||||
if (value === undefined || value.length === 0) {
|
||||
@@ -885,8 +897,14 @@ function safeDockerName(value: string): string {
|
||||
return value.replace(/[^a-zA-Z0-9_.-]/g, "-").slice(0, 80);
|
||||
}
|
||||
|
||||
function candidateGatewayName(taskId: string): string {
|
||||
return `unidesk-provider-gateway-${safeDockerName(config.providerId)}-candidate-${safeDockerName(taskId)}`.slice(0, 120);
|
||||
}
|
||||
|
||||
function upgradePlan(taskId: string): Record<string, JsonValue> {
|
||||
const workspace = config.upgradeWorkspacePath;
|
||||
const sleepMs = 300_000;
|
||||
const validationTimeoutMs = 180_000;
|
||||
const composeBaseCommand = [
|
||||
"docker",
|
||||
"compose",
|
||||
@@ -910,6 +928,8 @@ function upgradePlan(taskId: string): Record<string, JsonValue> {
|
||||
`label=com.docker.compose.project=${config.upgradeComposeProject}`,
|
||||
"--filter",
|
||||
`label=com.docker.compose.service=${config.upgradeService}`,
|
||||
"--filter",
|
||||
"label=com.docker.compose.oneoff=False",
|
||||
];
|
||||
const composeUpCommand = [
|
||||
...composeBaseCommand,
|
||||
@@ -920,22 +940,73 @@ function upgradePlan(taskId: string): Record<string, JsonValue> {
|
||||
config.upgradeService,
|
||||
];
|
||||
const updaterName = `unidesk-provider-upgrader-${safeDockerName(taskId)}`;
|
||||
const candidateName = candidateGatewayName(taskId);
|
||||
const validationNeedleOpen = `"message":"connect_open"`;
|
||||
const validationNeedleAck = `"requestId":"register"`;
|
||||
const validationNeedleOk = `"ok":true`;
|
||||
const validationAttempts = Math.max(1, Math.ceil(validationTimeoutMs / 2000));
|
||||
const targetGatewayMetadata = readTargetGatewayMetadata(workspace);
|
||||
const script = [
|
||||
"set -eu",
|
||||
"sleep 2",
|
||||
`cd ${shellQuote(workspace)}`,
|
||||
`old_ids=$(${listServiceContainersCommand.map(shellQuote).join(" ")})`,
|
||||
`first_old=""`,
|
||||
`for old_id in $old_ids; do first_old="$old_id"; break; done`,
|
||||
`old_name=""`,
|
||||
`mount_args=""`,
|
||||
`extra_host_args=""`,
|
||||
`candidate_env_file=${shellQuote(`/tmp/${candidateName}.env`)}`,
|
||||
`network_names=""`,
|
||||
`first_network=""`,
|
||||
`network_arg=""`,
|
||||
`if [ -n "$old_ids" ]; then docker update --restart always $old_ids >/dev/null 2>&1 || true; fi`,
|
||||
`if [ -z "$first_old" ]; then echo "no existing provider-gateway compose container found; cannot perform safe in-place upgrade" >&2; exit 1; fi`,
|
||||
`old_name=$(docker inspect --format '{{.Name}}' "$first_old")`,
|
||||
`old_name="\${old_name#/}"`,
|
||||
`docker inspect --format '{{range .Config.Env}}{{println .}}{{end}}' "$first_old" > "$candidate_env_file"`,
|
||||
`mount_args=$(docker inspect --format '{{range .Mounts}}{{printf "-v %s:%s" .Source .Destination}}{{if not .RW}}{{printf ":ro"}}{{end}}{{printf "\\n"}}{{end}}' "$first_old")`,
|
||||
`extra_host_args=$(docker inspect --format '{{range .HostConfig.ExtraHosts}}{{printf "--add-host %s\\n" .}}{{end}}' "$first_old")`,
|
||||
`network_names=$(docker inspect --format '{{range $name, $_ := .NetworkSettings.Networks}}{{printf "%s\\n" $name}}{{end}}' "$first_old")`,
|
||||
`for net in $network_names; do first_network="$net"; break; done`,
|
||||
`if [ -n "$first_network" ]; then network_arg="--network $first_network"; fi`,
|
||||
composeBuildCommand.map(shellQuote).join(" "),
|
||||
`ids=$(${listServiceContainersCommand.map(shellQuote).join(" ")})`,
|
||||
`if [ -n "$ids" ]; then docker rm -f $ids; fi`,
|
||||
composeUpCommand.map(shellQuote).join(" "),
|
||||
].join("; ");
|
||||
`docker rm -f ${shellQuote(candidateName)} >/dev/null 2>&1 || true`,
|
||||
[
|
||||
"docker run -d",
|
||||
`--name ${shellQuote(candidateName)}`,
|
||||
"--restart no",
|
||||
"$network_arg",
|
||||
`--env-file "$candidate_env_file"`,
|
||||
`--label ${shellQuote(`com.docker.compose.project=${config.upgradeComposeProject}`)}`,
|
||||
`--label ${shellQuote(`com.docker.compose.service=${config.upgradeService}`)}`,
|
||||
`--label ${shellQuote("com.docker.compose.oneoff=False")}`,
|
||||
"$mount_args",
|
||||
"$extra_host_args",
|
||||
shellQuote(config.upgradeRunnerImage),
|
||||
].join(" "),
|
||||
`if [ -n "$network_names" ]; then for net in $network_names; do if [ "$net" != "$first_network" ]; then docker network connect "$net" ${shellQuote(candidateName)} >/dev/null 2>&1 || true; fi; done; fi`,
|
||||
`attempt=0`,
|
||||
`validated=0`,
|
||||
`while [ "$attempt" -lt "${validationAttempts}" ]; do logs=$(docker logs ${shellQuote(candidateName)} 2>&1 || true); has_open=0; has_ack=0; has_ok=0; case "$logs" in *${shellQuote(validationNeedleOpen)}*) has_open=1;; esac; case "$logs" in *${shellQuote(validationNeedleAck)}*) has_ack=1;; esac; case "$logs" in *${shellQuote(validationNeedleOk)}*) has_ok=1;; esac; if [ "$has_open" = "1" ] && [ "$has_ack" = "1" ] && [ "$has_ok" = "1" ]; then validated=1; break; fi; candidate_running=$(docker inspect --format '{{.State.Running}}' ${shellQuote(candidateName)} 2>/dev/null || true); if [ "$candidate_running" != "true" ]; then break; fi; attempt=$((attempt + 1)); sleep 2; done`,
|
||||
`if [ "$validated" != "1" ]; then echo "candidate validation failed; old gateway will leave upgrade sleep automatically" >&2; docker logs ${shellQuote(candidateName)} >&2 || true; docker rm -f ${shellQuote(candidateName)} >/dev/null 2>&1 || true; rm -f "$candidate_env_file"; exit 1; fi`,
|
||||
`docker update --restart always ${shellQuote(candidateName)} >/dev/null`,
|
||||
`if [ -n "$old_ids" ]; then docker rm -f $old_ids; fi`,
|
||||
`if [ -n "$old_name" ] && [ "$old_name" != ${shellQuote(candidateName)} ]; then docker rename ${shellQuote(candidateName)} "$old_name" || true; fi`,
|
||||
`rm -f "$candidate_env_file"`,
|
||||
`echo "candidate provider-gateway validated and promoted"`,
|
||||
].join("\n");
|
||||
const dockerRunCommand = [
|
||||
"docker",
|
||||
"run",
|
||||
"-d",
|
||||
"--rm",
|
||||
"--name",
|
||||
updaterName,
|
||||
"--restart",
|
||||
"no",
|
||||
"--label",
|
||||
"com.docker.compose.oneoff=True",
|
||||
"--label",
|
||||
`unidesk.upgrade.updater=${taskId}`,
|
||||
"-v",
|
||||
"/var/run/docker.sock:/var/run/docker.sock",
|
||||
"-v",
|
||||
@@ -950,7 +1021,14 @@ function upgradePlan(taskId: string): Record<string, JsonValue> {
|
||||
return {
|
||||
policy: "always-enabled",
|
||||
taskId,
|
||||
providerId: config.providerId,
|
||||
providerName: config.providerName,
|
||||
providerGatewayName: gatewayMetadata.name,
|
||||
providerGatewayVersion: gatewayMetadata.version,
|
||||
targetProviderGatewayName: targetGatewayMetadata.name,
|
||||
targetProviderGatewayVersion: targetGatewayMetadata.version,
|
||||
updaterName,
|
||||
candidateName,
|
||||
runnerImage: config.upgradeRunnerImage,
|
||||
hostProjectRoot: config.upgradeHostProjectRoot,
|
||||
workspace,
|
||||
@@ -965,7 +1043,16 @@ function upgradePlan(taskId: string): Record<string, JsonValue> {
|
||||
listServiceContainersCommand,
|
||||
composeUpCommand,
|
||||
replacementStrategy: {
|
||||
buildBeforeRemove: true,
|
||||
buildBeforeCandidate: true,
|
||||
oldGatewaySleepMs: sleepMs,
|
||||
validationTimeoutMs,
|
||||
oldGatewayRestartPolicyBeforeSleep: "always",
|
||||
promoteOnlyAfterCandidateValidation: true,
|
||||
candidateRestartPolicyAfterPromotion: "always",
|
||||
candidateUsesOldContainerMounts: true,
|
||||
candidateUsesOldContainerNetworks: true,
|
||||
candidateUsesOldContainerExtraHosts: true,
|
||||
candidateUsesOldContainerEnvironment: true,
|
||||
removeScope: {
|
||||
projectLabel: config.upgradeComposeProject,
|
||||
serviceLabel: config.upgradeService,
|
||||
@@ -986,9 +1073,15 @@ async function runProviderUpgrade(taskId: string, payload: Record<string, JsonVa
|
||||
if (!result.ok) {
|
||||
throw new Error(`provider upgrade scheduler failed with exit ${result.exitCode}: ${result.stderr}`);
|
||||
}
|
||||
const sleepMs = Number((plan.replacementStrategy as Record<string, JsonValue>).oldGatewaySleepMs ?? 300_000);
|
||||
setTimeout(() => enterUpgradeSleep(sleepMs), 250).unref?.();
|
||||
return {
|
||||
mode,
|
||||
message: "provider gateway upgrade scheduled by detached updater container",
|
||||
message: "provider gateway upgrade scheduled with sleep-and-validate rollback guard",
|
||||
providerId: config.providerId,
|
||||
providerName: config.providerName,
|
||||
providerGatewayVersion: gatewayMetadata.version,
|
||||
targetProviderGatewayVersion: (plan.targetProviderGatewayVersion as string | undefined) ?? gatewayMetadata.version,
|
||||
updaterContainerId: result.stdout.trim(),
|
||||
plan,
|
||||
stderr: result.stderr.slice(0, 500),
|
||||
@@ -1197,13 +1290,65 @@ function handleMessage(raw: MessageEvent<string>): void {
|
||||
|
||||
function scheduleReconnect(): void {
|
||||
if (stopping) return;
|
||||
const sleepRemainingMs = upgradeSleepUntil - Date.now();
|
||||
if (sleepRemainingMs > 0) {
|
||||
logger("warn", "reconnect_deferred_for_upgrade_sleep", { sleepRemainingMs });
|
||||
if (upgradeSleepTimer === null) {
|
||||
upgradeSleepTimer = setTimeout(() => {
|
||||
upgradeSleepTimer = null;
|
||||
logger("warn", "upgrade_sleep_expired_reconnecting", { providerId: config.providerId });
|
||||
connect();
|
||||
}, sleepRemainingMs);
|
||||
}
|
||||
return;
|
||||
}
|
||||
reconnectAttempt += 1;
|
||||
const delayMs = Math.min(config.reconnectMaxMs, config.reconnectBaseMs * 2 ** Math.min(reconnectAttempt, 8));
|
||||
logger("warn", "reconnect_scheduled", { reconnectAttempt, delayMs });
|
||||
setTimeout(connect, delayMs);
|
||||
}
|
||||
|
||||
function clearConnectionTimers(): void {
|
||||
if (heartbeatTimer !== null) clearInterval(heartbeatTimer);
|
||||
if (systemStatusTimer !== null) clearInterval(systemStatusTimer);
|
||||
if (dockerStatusTimer !== null) clearInterval(dockerStatusTimer);
|
||||
heartbeatTimer = null;
|
||||
systemStatusTimer = null;
|
||||
dockerStatusTimer = null;
|
||||
}
|
||||
|
||||
function enterUpgradeSleep(durationMs: number): void {
|
||||
const sleepMs = Math.max(30_000, Math.min(300_000, Math.floor(durationMs)));
|
||||
upgradeSleepUntil = Date.now() + sleepMs;
|
||||
logger("warn", "upgrade_sleep_enter", { providerId: config.providerId, sleepMs, wakeAt: new Date(upgradeSleepUntil).toISOString() });
|
||||
clearConnectionTimers();
|
||||
try {
|
||||
socket?.close(4000, "provider upgrade sleep");
|
||||
} catch (error) {
|
||||
logger("warn", "upgrade_sleep_socket_close_failed", { error: String(error) });
|
||||
}
|
||||
socket = null;
|
||||
if (upgradeSleepTimer !== null) clearTimeout(upgradeSleepTimer);
|
||||
upgradeSleepTimer = setTimeout(() => {
|
||||
upgradeSleepTimer = null;
|
||||
logger("warn", "upgrade_sleep_expired_reconnecting", { providerId: config.providerId });
|
||||
connect();
|
||||
}, sleepMs);
|
||||
}
|
||||
|
||||
function connect(): void {
|
||||
if (stopping) return;
|
||||
const sleepRemainingMs = upgradeSleepUntil - Date.now();
|
||||
if (sleepRemainingMs > 0) {
|
||||
logger("warn", "connect_deferred_for_upgrade_sleep", { sleepRemainingMs });
|
||||
if (upgradeSleepTimer === null) {
|
||||
upgradeSleepTimer = setTimeout(() => {
|
||||
upgradeSleepTimer = null;
|
||||
connect();
|
||||
}, sleepRemainingMs);
|
||||
}
|
||||
return;
|
||||
}
|
||||
const url = withToken(config.serverUrl, config.token);
|
||||
logger("info", "connect_start", { serverUrl: config.serverUrl, providerId: config.providerId });
|
||||
socket = new WebSocket(url);
|
||||
@@ -1228,12 +1373,7 @@ function connect(): void {
|
||||
socket.addEventListener("message", (event) => handleMessage(event as MessageEvent<string>));
|
||||
socket.addEventListener("close", (event) => {
|
||||
logger("warn", "connect_close", { code: event.code, reason: event.reason });
|
||||
if (heartbeatTimer !== null) clearInterval(heartbeatTimer);
|
||||
if (systemStatusTimer !== null) clearInterval(systemStatusTimer);
|
||||
if (dockerStatusTimer !== null) clearInterval(dockerStatusTimer);
|
||||
heartbeatTimer = null;
|
||||
systemStatusTimer = null;
|
||||
dockerStatusTimer = null;
|
||||
clearConnectionTimers();
|
||||
scheduleReconnect();
|
||||
});
|
||||
socket.addEventListener("error", () => {
|
||||
|
||||
Reference in New Issue
Block a user