fix: use PK01 public PostgreSQL endpoint

This commit is contained in:
Codex
2026-06-12 05:37:16 +00:00
parent 23803d96f7
commit 9e8197e2cc
8 changed files with 130 additions and 8 deletions
+33 -1
View File
@@ -1,6 +1,6 @@
--- ---
name: unidesk-ops name: unidesk-ops
description: UniDesk 手动运维 CLI — `server``gc` 子命令,覆盖主 server 启停、健康检查、swap、日志、Docker 镜像清理、磁盘 GC服务重建。用户提到 server start、server status、server swap、server rebuild、gc、磁盘清理、运维时使用。 description: UniDesk 手动运维 CLI — `server``gc` 和 PK01 `platform-db postgres` 子命令,覆盖主 server 启停、健康检查、swap、日志、Docker 镜像清理、磁盘 GC服务重建和 PK01 host PostgreSQL 运维。用户提到 server start、server status、server swap、server rebuild、gc、磁盘清理、platform-db、PK01 PostgreSQL、运维时使用。
--- ---
# UniDesk 手动运维 CLI # UniDesk 手动运维 CLI
@@ -128,6 +128,38 @@ docker exec unidesk-backend-core sh -lc 'backend-core --fetch-json http://127.0.
--- ---
## PK01 Host PostgreSQL
PK01 host-native PostgreSQL 是平台外置状态库样板,声明文件是 `config/platform-db/postgres-pk01.yaml`,受控入口是:
```bash
bun scripts/cli.ts platform-db postgres plan --config config/platform-db/postgres-pk01.yaml
bun scripts/cli.ts platform-db postgres status --config config/platform-db/postgres-pk01.yaml
bun scripts/cli.ts platform-db postgres apply --config config/platform-db/postgres-pk01.yaml --confirm
bun scripts/cli.ts platform-db postgres apply --config config/platform-db/postgres-pk01.yaml --confirm --wait
```
- `plan` / `status` 只读;`apply --confirm` 默认创建本地异步 job`apply --confirm --wait` 会启动 PK01 侧 root-owned job 并短轮询。
- 输出只显示 Secret key 名、presence、fingerprint、连接 host、SSL 状态和状态摘要;禁止打印密码或完整 `DATABASE_URL`
- 跨节点消费者必须直连 YAML 的 `postgres.network.connectionHost`,当前是 PK01 公网 endpoint;不要让 D601/G14/Sub2API/HWLAB/AgentRun 通过 master server 中转 PostgreSQL。
- 当前 TLS 口径是 PostgreSQL native TLS + `sslmode=require``publicDns` 只是可选 alias;只要 `connectionHost` 是可达 IP,DNS 未解析不作为切库 blocker。
- 远端 PostgreSQL 配置或 `pg_hba` 来源 CIDR 变化后,先跑 `apply --confirm --wait`,再跑 `status`;若消费者公网出口 IP 变化,必须先更新 YAML `allowSources` 和对应 `pg_hba`
日常复验建议:
```bash
bun scripts/cli.ts platform-db postgres status --config config/platform-db/postgres-pk01.yaml
trans PK01 script <<'SCRIPT'
systemctl is-active postgresql
systemctl is-enabled postgresql
systemctl is-active unidesk-pk01-sub2api-pgdump.timer
SCRIPT
```
长期边界见 `docs/reference/pk01.md`Sub2API 消费侧边界见 `docs/reference/platform-infra.md`
---
## Moon Bridge 管理 ## Moon Bridge 管理
Moon Bridge 是 Codex ↔ 上游 provider 的桥接服务,通过 profile 级 wrapper 管理: Moon Bridge 是 Codex ↔ 上游 provider 的桥接服务,通过 profile 级 wrapper 管理:
+1
View File
@@ -20,6 +20,7 @@ target/
/output/ /output/
/debug.evidence /debug.evidence
/sync-keycloak-admin-password.tmp.sh /sync-keycloak-admin-password.tmp.sh
/.tencnet-ssh
/nul /nul
/= /=
/\\ /\\
+1
View File
@@ -226,6 +226,7 @@ UniDesk 是一个以主 server 为统一入口的分布式工作平台;本文
- `bun scripts/cli.ts gc plan|run|db-trace|policy|remote`:主 server 或受控 provider 磁盘高水位一次性缓解和低风险防膨胀入口,覆盖日志、journald、Docker BuildKit cache、allowlisted `/tmp` 诊断目录、显式 opt-in stale `/tmp` 直接子项、受限 core dump、显式 trace 遥测留存和 systemd 定时策略;规则见 `docs/reference/gc.md` - `bun scripts/cli.ts gc plan|run|db-trace|policy|remote`:主 server 或受控 provider 磁盘高水位一次性缓解和低风险防膨胀入口,覆盖日志、journald、Docker BuildKit cache、allowlisted `/tmp` 诊断目录、显式 opt-in stale `/tmp` 直接子项、受限 core dump、显式 trace 遥测留存和 systemd 定时策略;规则见 `docs/reference/gc.md`
- `bun scripts/cli.ts server rebuild <backend-core|frontend|dev-frontend-proxy|provider-gateway|todo-note|code-queue-mgr|project-manager|baidu-netdisk|oa-event-flow>`:以 build-first、Compose lock、no-deps force-recreate 和 post-up validation 的异步 job 重建主 server Compose 内单个服务;对 database、File Browser、Code Queue 执行面、k3sctl-adapter 或未知对象返回结构化 `unsupported-server-rebuild`,规则见 `docs/reference/deployment.md``docs/reference/cicd-standardization.md` - `bun scripts/cli.ts server rebuild <backend-core|frontend|dev-frontend-proxy|provider-gateway|todo-note|code-queue-mgr|project-manager|baidu-netdisk|oa-event-flow>`:以 build-first、Compose lock、no-deps force-recreate 和 post-up validation 的异步 job 重建主 server Compose 内单个服务;对 database、File Browser、Code Queue 执行面、k3sctl-adapter 或未知对象返回结构化 `unsupported-server-rebuild`,规则见 `docs/reference/deployment.md``docs/reference/cicd-standardization.md`
- `bun scripts/cli.ts provider attach <providerId> [--master-server URL] [--up] [--force]` / `bun scripts/cli.ts provider triage <providerId> [--observed-error text] [--observed-scope scope] [--microservice id ...] [--full|--raw]`:前者在新增计算节点上生成两项配置的 provider-gateway 挂载包;后者是只读多信号健康裁决入口,默认低噪声输出 `decision``healthyScopes``failedScopes``retryable` 和异常信号摘要,用来把单路径 `provider is not online`、SSH 超时、registry 失败或 proxy 失败归类为 `retryable-transient``service-degraded``global-offline`,完整 evidence 需显式 `--full|--raw`,规则见 `docs/reference/provider-gateway.md``docs/reference/code-queue-supervision.md` - `bun scripts/cli.ts provider attach <providerId> [--master-server URL] [--up] [--force]` / `bun scripts/cli.ts provider triage <providerId> [--observed-error text] [--observed-scope scope] [--microservice id ...] [--full|--raw]`:前者在新增计算节点上生成两项配置的 provider-gateway 挂载包;后者是只读多信号健康裁决入口,默认低噪声输出 `decision``healthyScopes``failedScopes``retryable` 和异常信号摘要,用来把单路径 `provider is not online`、SSH 超时、registry 失败或 proxy 失败归类为 `retryable-transient``service-degraded``global-offline`,完整 evidence 需显式 `--full|--raw`,规则见 `docs/reference/provider-gateway.md``docs/reference/code-queue-supervision.md`
- `bun scripts/cli.ts platform-db postgres plan|status|apply --config config/platform-db/postgres-pk01.yaml`:管理 PK01 host-native PostgreSQL 16 外置平台库、TLS、Secret 导出和备份;跨节点消费者直连 YAML 声明的 PK01 公网 endpoint,不经 master server 中转,规则见 `docs/reference/pk01.md`
- `trans <route> [operation args...]` / `tran <route> [operation args...]`:通过 provider-gateway 的 Host SSH / WSL SSH 维护桥进入 provider、host workspace、Windows cmd route、k3s 控制面或 pod workspace,并提供带 SHA-256 校验的 `upload`/`download` 文件传输;主 server 人工/Codex 分布式操作必须优先用本机 `trans` wrapper`tran` 只作为兼容入口,细则见 `docs/reference/cli.md``docs/reference/windows-passthrough.md``docs/reference/provider-gateway.md` - `trans <route> [operation args...]` / `tran <route> [operation args...]`:通过 provider-gateway 的 Host SSH / WSL SSH 维护桥进入 provider、host workspace、Windows cmd route、k3s 控制面或 pod workspace,并提供带 SHA-256 校验的 `upload`/`download` 文件传输;主 server 人工/Codex 分布式操作必须优先用本机 `trans` wrapper`tran` 只作为兼容入口,细则见 `docs/reference/cli.md``docs/reference/windows-passthrough.md``docs/reference/provider-gateway.md`
- `bun scripts/cli.ts microservice list/status/health/diagnostics/tunnel-self-test/proxy`:管理和验证挂载在主 server、计算节点 Docker 或 k3s 控制面上的用户服务,`status/health/diagnostics` 默认 compact summary 并用 `--full|--raw` 展开完整 body`proxy` 支持受控 JSON bodyOA Event Flow/Todo Note/Baidu Netdisk/Code Queue Manager on main-server、k3s Control/Code Queue 执行面/MDTODO/Decision Center/FindJob/Pipeline/MET Nonlinear on D601 的规则见 `docs/reference/microservices.md` - `bun scripts/cli.ts microservice list/status/health/diagnostics/tunnel-self-test/proxy`:管理和验证挂载在主 server、计算节点 Docker 或 k3s 控制面上的用户服务,`status/health/diagnostics` 默认 compact summary 并用 `--full|--raw` 展开完整 body`proxy` 支持受控 JSON bodyOA Event Flow/Todo Note/Baidu Netdisk/Code Queue Manager on main-server、k3s Control/Code Queue 执行面/MDTODO/Decision Center/FindJob/Pipeline/MET Nonlinear on D601 的规则见 `docs/reference/microservices.md`
- `bun scripts/cli.ts microservice health/diagnostics/proxy code-agent-sandbox`:验证独立 Code Agent Sandbox 的 health、只读 diagnostics、trace 和 adapter/mode/credential boundary 契约,规则见 `docs/reference/code-agent-sandbox.md` - `bun scripts/cli.ts microservice health/diagnostics/proxy code-agent-sandbox`:验证独立 Code Agent Sandbox 的 health、只读 diagnostics、trace 和 adapter/mode/credential boundary 契约,规则见 `docs/reference/code-agent-sandbox.md`
+2 -2
View File
@@ -58,7 +58,7 @@ postgres:
- 127.0.0.1 - 127.0.0.1
- 10.0.8.3 - 10.0.8.3
- 0.0.0.0 - 0.0.0.0
connectionHost: db.pikapython.com connectionHost: 82.156.23.220
publicDns: db.pikapython.com publicDns: db.pikapython.com
transport: postgres-native-tls transport: postgres-native-tls
sslmode: require sslmode: require
@@ -166,7 +166,7 @@ exports:
envKey: DATABASE_URL envKey: DATABASE_URL
format: postgresql://$(SUB2API_DB_USER):$(SUB2API_DB_PASSWORD)@$(PGHOST):5432/$(SUB2API_DB_NAME)?sslmode=require format: postgresql://$(SUB2API_DB_USER):$(SUB2API_DB_PASSWORD)@$(PGHOST):5432/$(SUB2API_DB_NAME)?sslmode=require
variables: variables:
PGHOST: db.pikapython.com PGHOST: 82.156.23.220
writeToSecretSource: writeToSecretSource:
sourceRef: platform-infra/sub2api.env sourceRef: platform-infra/sub2api.env
key: DATABASE_URL key: DATABASE_URL
+1
View File
@@ -36,6 +36,7 @@ CI/CD、GitOps、rollout、artifact 发布、PR 合并后的 runtime lane 滚动
- `gc plan|run --confirm|db-trace|policy|remote` 是主 server 和受控 provider 的磁盘高水位一次性缓解与长期防膨胀入口。`plan` 只读输出候选、风险、估算收益和保护对象;`run` 必须显式 `--confirm``gc remote <providerId> ...` 通过 UniDesk SSH 透传执行远端 GC`--target-use-percent N` 会在 `summary.target` 中报告目标水位所需释放量、候选估算、预计水位、缺口和 safe-stop 决策。默认只包含 allowlisted `/tmp` 诊断目录;非 allowlist stale `/tmp` 直接子项必须显式 `--include-stale-tmp`,并只允许删除 `/tmp` 一级子项且避开系统 socket/session 前缀。G14/HWLAB registry retention、受限 core dump、保护对象、safe-stop 线和长期收益表的权威规则见 `docs/reference/gc.md` - `gc plan|run --confirm|db-trace|policy|remote` 是主 server 和受控 provider 的磁盘高水位一次性缓解与长期防膨胀入口。`plan` 只读输出候选、风险、估算收益和保护对象;`run` 必须显式 `--confirm``gc remote <providerId> ...` 通过 UniDesk SSH 透传执行远端 GC`--target-use-percent N` 会在 `summary.target` 中报告目标水位所需释放量、候选估算、预计水位、缺口和 safe-stop 决策。默认只包含 allowlisted `/tmp` 诊断目录;非 allowlist stale `/tmp` 直接子项必须显式 `--include-stale-tmp`,并只允许删除 `/tmp` 一级子项且避开系统 socket/session 前缀。G14/HWLAB registry retention、受限 core dump、保护对象、safe-stop 线和长期收益表的权威规则见 `docs/reference/gc.md`
- `server rebuild <backend-core|frontend|dev-frontend-proxy|provider-gateway|todo-note|code-queue-mgr|project-manager|baidu-netdisk|oa-event-flow>` 创建异步 job,先构建目标服务镜像,随后在 `.state/locks/server-compose.lock` 串行保护下用 `--no-deps --force-recreate` 替换目标 service 并等待容器 `healthy/running`;该命令用于替代手工删除容器的兜底流程,其中 `dev-frontend-proxy` 只更新主 server dev 入口薄代理,`todo-note``code-queue-mgr``project-manager``baidu-netdisk``oa-event-flow` 只重建主 server 承载的对应后端,不会重建或删除 database 命名卷。D601 Code Queue 执行面不由 `server rebuild` 管理;Rust backend-core 常规迭代不得用该命令在 master server 编译,只有明确的 backend-core 主 server 上线例外可以按限流、异步轮询和 health 证据执行,规则见 `docs/reference/dev-environment.md` - `server rebuild <backend-core|frontend|dev-frontend-proxy|provider-gateway|todo-note|code-queue-mgr|project-manager|baidu-netdisk|oa-event-flow>` 创建异步 job,先构建目标服务镜像,随后在 `.state/locks/server-compose.lock` 串行保护下用 `--no-deps --force-recreate` 替换目标 service 并等待容器 `healthy/running`;该命令用于替代手工删除容器的兜底流程,其中 `dev-frontend-proxy` 只更新主 server dev 入口薄代理,`todo-note``code-queue-mgr``project-manager``baidu-netdisk``oa-event-flow` 只重建主 server 承载的对应后端,不会重建或删除 database 命名卷。D601 Code Queue 执行面不由 `server rebuild` 管理;Rust backend-core 常规迭代不得用该命令在 master server 编译,只有明确的 backend-core 主 server 上线例外可以按限流、异步轮询和 health 证据执行,规则见 `docs/reference/dev-environment.md`
- `provider attach <providerId> [--master-server URL] [--up] [--force]` 在新计算节点生成两项配置的 provider-gateway 挂载包:`.state/provider-<ID>.env` 默认只包含 `UNIDESK_MASTER_SERVER``PROVIDER_ID``provider-<ID>.yml` 固定 Docker socket、`pid: "host"``restart: always`、只读 `/workspace` 和 SSH 维护私钥挂载;`--up` 会立即执行生成的 `docker compose up -d --build``provider triage <providerId> [--observed-error text] [--observed-scope scope] [--microservice id ...] [--full|--raw]` 是只读多信号健康裁决入口,会把单路径 `provider is not online`、SSH 超时、registry 失败和 service proxy 失败归类成 `runner-local-observation-gap``service-degraded``provider-degraded``global-blocker`。默认输出只返回裁决、scope、失败/降级/未知信号和有界 evidence 摘要,完整 evidence 必须显式加 `--full``--raw`;推荐交叉验证命令仍包含 `debug health``debug dispatch <providerId> host.ssh --wait-ms 15000``trans <providerId> argv true``artifact-registry health --provider-id <providerId>``microservice health k3sctl-adapter``microservice health code-queue``codex tasks --view supervisor --limit 20` - `provider attach <providerId> [--master-server URL] [--up] [--force]` 在新计算节点生成两项配置的 provider-gateway 挂载包:`.state/provider-<ID>.env` 默认只包含 `UNIDESK_MASTER_SERVER``PROVIDER_ID``provider-<ID>.yml` 固定 Docker socket、`pid: "host"``restart: always`、只读 `/workspace` 和 SSH 维护私钥挂载;`--up` 会立即执行生成的 `docker compose up -d --build``provider triage <providerId> [--observed-error text] [--observed-scope scope] [--microservice id ...] [--full|--raw]` 是只读多信号健康裁决入口,会把单路径 `provider is not online`、SSH 超时、registry 失败和 service proxy 失败归类成 `runner-local-observation-gap``service-degraded``provider-degraded``global-blocker`。默认输出只返回裁决、scope、失败/降级/未知信号和有界 evidence 摘要,完整 evidence 必须显式加 `--full``--raw`;推荐交叉验证命令仍包含 `debug health``debug dispatch <providerId> host.ssh --wait-ms 15000``trans <providerId> argv true``artifact-registry health --provider-id <providerId>``microservice health k3sctl-adapter``microservice health code-queue``codex tasks --view supervisor --limit 20`
- `platform-db postgres plan|status|apply --config config/platform-db/postgres-pk01.yaml` 管理 YAML 声明的 PK01 host-native PostgreSQL。`plan``status` 只读采集远端 host、PostgreSQL、TLS、DNS alias 和 Secret 形态;`apply --confirm` 默认创建本地异步 job`apply --confirm --wait` 用远端 root-owned job 收敛 systemd PostgreSQL、TLS、`pg_hba`、role/database、Secret export 和备份 timer。输出不得打印密码或完整 `DATABASE_URL`。跨节点消费者使用 YAML 中的 `connectionHost` 直连 PK01 公网 endpointDNS alias 不作为 `sslmode=require` 切库 blockerPK01 规则见 `docs/reference/pk01.md`
- `trans <route> [operation args...]` / `tran <route> [operation args...]` 通过 backend-core 内网 WebSocket broker 和 provider-gateway 的 Host SSH / WSL SSH 维护桥连接目标节点;`route` 基础形态是 provider id,例如 `D601``G14`,也可以扩展为纯定位路径 `provider:plane[:namespace:resource[:container]]`,例如 `D601:win``D601:win/c/test``G14:k3s``D601:k3s``G14:k3s:<namespace>:<workload>`。WSL provider 的 Windows plane 固定使用 `win`,不得使用 `win32`Windows operation 必须显式区分:`ps` 执行 Windows PowerShell heredoc 或一行 PowerShell 命令,`cmd` 执行 cmd.exe/batch`skills` 发现 Windows skill 目录。需要 Windows cwd 时用 `trans D601:win/c/test ps``trans D601:win/c/test cmd cd`CLI 自动设置 UTF-8/Python 编码默认值;`cmd` 额外设置 `chcp 65001`。非交互远端命令优先使用 `trans <providerId> argv ...`;需要 POSIX shell 脚本、管道、变量或循环时优先使用 quoted heredoc 单步传输,例如 `trans G14 script <<'SCRIPT'``trans G14:k3s script <<'SCRIPT'``trans G14:k3s:<namespace>:<workload> script <<'SCRIPT'`,把脚本走 stdin。`script` 只表示 host/k3s POSIX shell,不表示 Windows PowerShellWindows PowerShell 必须写 `trans <provider>:win ps <<'PS'``script -- '<单个字符串>'` 是无需 stdin 的远端 POSIX shell one-liner,例如 `trans G14:/root/hwlab script -- 'cd /root/hwlab && git status --short --branch'``script -- <多个 argv>` 才是 direct argv,适合 `trans D601:/path script -- sed -n '1,20p' file` 这类带短横线的单进程命令。顶层 remote option parser 必须保留命令已经开始后的 `--`,不得把它吞成全局选项结束符。需要远端改文本文件时默认优先使用 `<route> apply-patch < patch.diff`;需要可靠传输非文本或整文件时使用 `<route> upload <local-file> <remote-file>``<route> download <remote-file> <local-file>`CLI 会按字节数与 SHA-256 自动校验并在 provider-gateway stdin/argv 限制下切换客户端分块策略;需要旧 helper 时显式使用 `<provider>:k3s:<namespace>:<workload> apply-patch-v1``<providerId> apply-patch-v1`。ssh-like 命令遇到 timeout/kex/255 类失败时,CLI 会在 stderr 追加一行 `UNIDESK_SSH_HINT` JSON,提示 stdin script/argv 重试和 provider triage 交叉验证。 - `trans <route> [operation args...]` / `tran <route> [operation args...]` 通过 backend-core 内网 WebSocket broker 和 provider-gateway 的 Host SSH / WSL SSH 维护桥连接目标节点;`route` 基础形态是 provider id,例如 `D601``G14`,也可以扩展为纯定位路径 `provider:plane[:namespace:resource[:container]]`,例如 `D601:win``D601:win/c/test``G14:k3s``D601:k3s``G14:k3s:<namespace>:<workload>`。WSL provider 的 Windows plane 固定使用 `win`,不得使用 `win32`Windows operation 必须显式区分:`ps` 执行 Windows PowerShell heredoc 或一行 PowerShell 命令,`cmd` 执行 cmd.exe/batch`skills` 发现 Windows skill 目录。需要 Windows cwd 时用 `trans D601:win/c/test ps``trans D601:win/c/test cmd cd`CLI 自动设置 UTF-8/Python 编码默认值;`cmd` 额外设置 `chcp 65001`。非交互远端命令优先使用 `trans <providerId> argv ...`;需要 POSIX shell 脚本、管道、变量或循环时优先使用 quoted heredoc 单步传输,例如 `trans G14 script <<'SCRIPT'``trans G14:k3s script <<'SCRIPT'``trans G14:k3s:<namespace>:<workload> script <<'SCRIPT'`,把脚本走 stdin。`script` 只表示 host/k3s POSIX shell,不表示 Windows PowerShellWindows PowerShell 必须写 `trans <provider>:win ps <<'PS'``script -- '<单个字符串>'` 是无需 stdin 的远端 POSIX shell one-liner,例如 `trans G14:/root/hwlab script -- 'cd /root/hwlab && git status --short --branch'``script -- <多个 argv>` 才是 direct argv,适合 `trans D601:/path script -- sed -n '1,20p' file` 这类带短横线的单进程命令。顶层 remote option parser 必须保留命令已经开始后的 `--`,不得把它吞成全局选项结束符。需要远端改文本文件时默认优先使用 `<route> apply-patch < patch.diff`;需要可靠传输非文本或整文件时使用 `<route> upload <local-file> <remote-file>``<route> download <remote-file> <local-file>`CLI 会按字节数与 SHA-256 自动校验并在 provider-gateway stdin/argv 限制下切换客户端分块策略;需要旧 helper 时显式使用 `<provider>:k3s:<namespace>:<workload> apply-patch-v1``<providerId> apply-patch-v1`。ssh-like 命令遇到 timeout/kex/255 类失败时,CLI 会在 stderr 追加一行 `UNIDESK_SSH_HINT` JSON,提示 stdin script/argv 重试和 provider triage 交叉验证。
- `trans <route> apply-patch < patch.diff` 是默认推荐的远端 patch 入口:本地 TypeScript line-based engine 解析和计算新文件内容,远端 route 只负责读写文件;支持 host workspace、k3s pod workspace、Windows workspace route(例如 `D601:win/c/test`)和 frontend transport,并优先处理长中文/Unicode、低上下文插入、重复块 `@@` 定位等旧 helper 容易失败的场景。`apply-patch` 输出按 Codex 标准文本口径,不套 UniDesk JSON 限制:成功 stdout 为 `Success. Updated the following files:`,失败 stdout 为空、stderr 写失败原因;多文件补丁中途失败时,stderr 只列出第一个失败前已成功执行的 hunk 和失败 hunk,随后按 Codex 语义停止,不继续尝试后续 hunk。v2 兼容常见 MiniMax/MXCX 非标准补丁输入,例如重复 nested `*** Begin Patch` / `*** End Patch` envelope、unified-diff hunk header、Add/Delete 误加 `@@`、Update context 漏掉前导空格,并在 stderr 给出 canonical 写法 hintparser 或上下文失败时仍坚持唯一 v2 引擎,只提示修正 patch 文本或 hunk context,不自动重试或切换到 `apply-patch-v1`;大块/函数替换因上下文过期失败时,正确动作是重新读取当前目标块、缩小或拆分 Update File hunk 后继续用 `apply-patch`,不得改走 `download`/`upload`、远端 Python/Perl/sed heredoc 或整文件重写。Windows route 复用同一套 v2 核心算法,只把底层读写替换成 PowerShell 文件系统接口;`trans <providerId> apply-patch-v1 [tool args...] < patch.diff` 保留为显式 legacy 入口,直接调用远端注入的 `apply_patch` sh/perl helper;默认 `apply-patch` 不把 v1 当 fallback。 - `trans <route> apply-patch < patch.diff` 是默认推荐的远端 patch 入口:本地 TypeScript line-based engine 解析和计算新文件内容,远端 route 只负责读写文件;支持 host workspace、k3s pod workspace、Windows workspace route(例如 `D601:win/c/test`)和 frontend transport,并优先处理长中文/Unicode、低上下文插入、重复块 `@@` 定位等旧 helper 容易失败的场景。`apply-patch` 输出按 Codex 标准文本口径,不套 UniDesk JSON 限制:成功 stdout 为 `Success. Updated the following files:`,失败 stdout 为空、stderr 写失败原因;多文件补丁中途失败时,stderr 只列出第一个失败前已成功执行的 hunk 和失败 hunk,随后按 Codex 语义停止,不继续尝试后续 hunk。v2 兼容常见 MiniMax/MXCX 非标准补丁输入,例如重复 nested `*** Begin Patch` / `*** End Patch` envelope、unified-diff hunk header、Add/Delete 误加 `@@`、Update context 漏掉前导空格,并在 stderr 给出 canonical 写法 hintparser 或上下文失败时仍坚持唯一 v2 引擎,只提示修正 patch 文本或 hunk context,不自动重试或切换到 `apply-patch-v1`;大块/函数替换因上下文过期失败时,正确动作是重新读取当前目标块、缩小或拆分 Update File hunk 后继续用 `apply-patch`,不得改走 `download`/`upload`、远端 Python/Perl/sed heredoc 或整文件重写。Windows route 复用同一套 v2 核心算法,只把底层读写替换成 PowerShell 文件系统接口;`trans <providerId> apply-patch-v1 [tool args...] < patch.diff` 保留为显式 legacy 入口,直接调用远端注入的 `apply_patch` sh/perl helper;默认 `apply-patch` 不把 v1 当 fallback。
- `apply-patch` v2 每次结束都会在 stderr 追加一行 `UNIDESK_APPLY_PATCH_TIMING {json}`,字段包含 `durationMs``patchBytes``fileCount``hunkCount``changedCount``remoteOperationCount``remoteOperationCounts``remoteElapsedMs``remoteFailureCount``providerId``route``transport`(可得时)。普通 POSIX host/k3s 和 Windows workspace 远端的多文件 `Update File` patch 会优先合并成 bulk read/write,避免每个文件单独 stat/read/write 的 SSH 往返;Add/Delete/Move 等复杂 patch 保持原有逐步语义。timing 摘要只用于定位慢在 patch 解析、远端 stat/read/write 或 bulk read/write、provider session 还是传输层,不能替代 Codex 标准 stdout/stderr 成功失败文本,也不是门禁或自动判断。 - `apply-patch` v2 每次结束都会在 stderr 追加一行 `UNIDESK_APPLY_PATCH_TIMING {json}`,字段包含 `durationMs``patchBytes``fileCount``hunkCount``changedCount``remoteOperationCount``remoteOperationCounts``remoteElapsedMs``remoteFailureCount``providerId``route``transport`(可得时)。普通 POSIX host/k3s 和 Windows workspace 远端的多文件 `Update File` patch 会优先合并成 bulk read/write,避免每个文件单独 stat/read/write 的 SSH 往返;Add/Delete/Move 等复杂 patch 保持原有逐步语义。timing 摘要只用于定位慢在 patch 解析、远端 stat/read/write 或 bulk read/write、provider session 还是传输层,不能替代 Codex 标准 stdout/stderr 成功失败文本,也不是门禁或自动判断。
+10 -1
View File
@@ -58,6 +58,16 @@ PK01 currently hosts existing Docker workloads:
`pikanode` mounts `/home/ubuntu/pikanode` read-write into the container. Static/generated download artifacts under `html/download/` and repository data under `files/` may be user-visible or needed by the service. They are not generic GC candidates. `pikanode` mounts `/home/ubuntu/pikanode` read-write into the container. Static/generated download artifacts under `html/download/` and repository data under `files/` may be user-visible or needed by the service. They are not generic GC candidates.
## Host PostgreSQL
PK01 host-native PostgreSQL is declared by `config/platform-db/postgres-pk01.yaml` and managed through `bun scripts/cli.ts platform-db postgres plan|status|apply`; daily operation commands live in `$unidesk-ops` at `.agents/skills/unidesk-ops/SKILL.md`. It is a host systemd service, not a Docker container or k3s workload. The YAML is the source of truth for PostgreSQL version, TLS mode, listening addresses, `pg_hba` source CIDRs, generated Secret source files, exported `DATABASE_URL`, and backup timer settings.
Cross-node platform consumers must connect directly to the YAML-declared `postgres.network.connectionHost`. For consumers outside the PK01 private VPC, that value must be PK01's public endpoint, not the private `10.0.8.3` address and not a master-server tunnel. The master server may run control-plane CLI operations and secret sync, but it must not become the data-plane relay for D601, G14, Sub2API, HWLAB, AgentRun, or other PostgreSQL clients.
`postgres.network.publicDns` is an optional alias for operator readability and future `sslmode=verify-full` work. With the current PostgreSQL native TLS posture, clients use `sslmode=require`; DNS resolution is therefore not a cutover blocker when `connectionHost` is a reachable IP endpoint. If `publicDns` later becomes the connection host or `verify-full` is enabled, certificate common name/SAN and DNS must be promoted back into the cutover criteria.
The exported Sub2API connection string is written under the configured ignored Secret root and must never be committed or printed in full. CLI output may show key names, presence, fingerprints, selected host, SSL status, and source/target Secret references only. If a consumer's public egress IP changes, update the YAML `allowSources` and matching `pg_hba` rules, then use the `$unidesk-ops` PK01 Host PostgreSQL workflow to apply and recheck status.
## Disk GC Policy ## Disk GC Policy
PK01 follows the same safe-stop principle as G14: first produce a bounded attribution, then clean only classified candidates, and stop when remaining pressure is in protected runtime data. PK01 follows the same safe-stop principle as G14: first produce a bounded attribution, then clean only classified candidates, and stop when remaining pressure is in protected runtime data.
@@ -128,4 +138,3 @@ Interpretation guide:
| `/home/ubuntu/.vscode-server` | VS Code remote server, extensions, and cache | Do not delete installed servers/extensions by default; cached VSIX cleanup needs an explicit policy. | | `/home/ubuntu/.vscode-server` | VS Code remote server, extensions, and cache | Do not delete installed servers/extensions by default; cached VSIX cleanup needs an explicit policy. |
| `/var/lib/docker` | Docker overlay/image/container state for PK01 workloads | Do not prune generically; inspect running containers first. | | `/var/lib/docker` | Docker overlay/image/container state for PK01 workloads | Do not prune generically; inspect running containers first. |
| `/var/log/journal` | systemd journal | Managed by journald cap; use sudo when vacuuming manually. | | `/var/log/journal` | systemd journal | Managed by journald cap; use sudo when vacuuming manually. |
+1
View File
@@ -19,6 +19,7 @@
- Sub2API currently has no resource limits by design. Do not add CPU or memory limits unless a later explicit decision changes that policy and stores the new policy in YAML. - Sub2API currently has no resource limits by design. Do not add CPU or memory limits unless a later explicit decision changes that policy and stores the new policy in YAML.
- Master server is a consumer/control host, not the runtime location. Do not deploy Sub2API, PostgreSQL, Redis, or heavy validation loops on master server. - Master server is a consumer/control host, not the runtime location. Do not deploy Sub2API, PostgreSQL, Redis, or heavy validation loops on master server.
- D601 Sub2API is a predeployment target, not a second active singleton. While the platform database handoff is pending, it must render without a local PostgreSQL StatefulSet, keep the Sub2API app and local Redis cache scaled to zero, and use only ephemeral Redis storage when Redis is later activated. After the external platform DB endpoint, Secret, and runtime images are ready, activation must be expressed by YAML and applied through the same `platform-infra sub2api --target D601` CLI path. - D601 Sub2API is a predeployment target, not a second active singleton. While the platform database handoff is pending, it must render without a local PostgreSQL StatefulSet, keep the Sub2API app and local Redis cache scaled to zero, and use only ephemeral Redis storage when Redis is later activated. After the external platform DB endpoint, Secret, and runtime images are ready, activation must be expressed by YAML and applied through the same `platform-infra sub2api --target D601` CLI path.
- External platform PostgreSQL endpoints for Sub2API are produced by the platform DB YAML and its `platform-db postgres` CLI. Cross-node Sub2API consumers connect directly to that endpoint; the master server is not a PostgreSQL data-plane relay. DNS aliases are optional when the exported `DATABASE_URL` uses a reachable IP with `sslmode=require`; current PK01-specific rules live in `docs/reference/pk01.md`.
- Sub2API account sentinel and public FRP exposure remain singleton concerns. Do not create a second sentinel or public management surface for D601 unless a later YAML-controlled decision explicitly moves or splits that responsibility. - Sub2API account sentinel and public FRP exposure remain singleton concerns. Do not create a second sentinel or public management surface for D601 unless a later YAML-controlled decision explicitly moves or splits that responsibility.
## Codex Pool Routing ## Codex Pool Routing
+81 -4
View File
@@ -1,4 +1,5 @@
import { createHash, randomBytes } from "node:crypto"; import { createHash, randomBytes } from "node:crypto";
import { spawnSync } from "node:child_process";
import { chmodSync, existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs"; import { chmodSync, existsSync, mkdirSync, readFileSync, writeFileSync } from "node:fs";
import { dirname, isAbsolute, join } from "node:path"; import { dirname, isAbsolute, join } from "node:path";
import type { UniDeskConfig } from "./config"; import type { UniDeskConfig } from "./config";
@@ -248,10 +249,23 @@ interface RemoteFacts {
appConnectionOk: boolean | null; appConnectionOk: boolean | null;
appConnectionSsl: boolean | null; appConnectionSsl: boolean | null;
appConnectionHost: string | null; appConnectionHost: string | null;
appConnectionError: string | null;
}; };
raw?: Record<string, unknown>; raw?: Record<string, unknown>;
} }
interface ControllerConnectionProbe {
attempted: boolean;
ok: boolean | null;
ssl: boolean | null;
host: string;
port: number;
clientAddr: string | null;
serverAddr: string | null;
psqlVersion: string | null;
error: string | null;
}
interface RemoteJobStart { interface RemoteJobStart {
ok: boolean; ok: boolean;
remoteJobId: string | null; remoteJobId: string | null;
@@ -367,6 +381,8 @@ async function status(config: UniDeskConfig, options: PlatformDbOptions): Promis
const secrets = inspectSecrets(pg, false); const secrets = inspectSecrets(pg, false);
const remote = await remoteFacts(config, pg, secrets); const remote = await remoteFacts(config, pg, secrets);
const facts = remote.parsed; const facts = remote.parsed;
const controllerConnection = controllerConnectionProbe(pg, secrets);
const endpointHealthy = controllerConnection.ok === true && controllerConnection.ssl === true;
const deploymentHealthy = facts !== null const deploymentHealthy = facts !== null
&& facts.postgres.packageInstalled && facts.postgres.packageInstalled
&& facts.postgres.serviceActive && facts.postgres.serviceActive
@@ -376,7 +392,14 @@ async function status(config: UniDeskConfig, options: PlatformDbOptions): Promis
&& facts.network.port5432Listening && facts.network.port5432Listening
&& facts.postgres.appConnectionOk === true && facts.postgres.appConnectionOk === true
&& facts.postgres.appConnectionSsl === true; && facts.postgres.appConnectionSsl === true;
const cutoverReady = deploymentHealthy && facts.network.dns.resolves; const cutoverReady = deploymentHealthy && secrets.ok && endpointHealthy;
const cutoverBlockers = cutoverReady
? []
: [
...(deploymentHealthy ? [] : ["deployment-unhealthy"]),
...(secrets.ok ? [] : ["secrets-unhealthy"]),
...(endpointHealthy ? [] : [controllerConnection.attempted ? "connection-host-probe-failed" : "connection-host-probe-unavailable"]),
];
return { return {
ok: remote.capture.exitCode === 0 && facts !== null && deploymentHealthy && secrets.ok, ok: remote.capture.exitCode === 0 && facts !== null && deploymentHealthy && secrets.ok,
action: "platform-db-postgres-status", action: "platform-db-postgres-status",
@@ -386,7 +409,9 @@ async function status(config: UniDeskConfig, options: PlatformDbOptions): Promis
healthy: deploymentHealthy, healthy: deploymentHealthy,
deploymentHealthy, deploymentHealthy,
cutoverReady, cutoverReady,
cutoverBlockers: cutoverReady ? [] : ["dns-db-pikapython-unresolved"], cutoverBlockers,
connectionHost: pg.postgres.network.connectionHost,
publicDns: pg.postgres.network.publicDns,
pgVersion: pg.postgres.package.version, pgVersion: pg.postgres.package.version,
packageInstalled: facts.postgres.packageInstalled, packageInstalled: facts.postgres.packageInstalled,
serviceActive: facts.postgres.serviceActive, serviceActive: facts.postgres.serviceActive,
@@ -399,10 +424,13 @@ async function status(config: UniDeskConfig, options: PlatformDbOptions): Promis
appConnectionOk: facts.postgres.appConnectionOk, appConnectionOk: facts.postgres.appConnectionOk,
appConnectionSsl: facts.postgres.appConnectionSsl, appConnectionSsl: facts.postgres.appConnectionSsl,
appConnectionHost: facts.postgres.appConnectionHost, appConnectionHost: facts.postgres.appConnectionHost,
appConnectionError: facts.postgres.appConnectionError,
connectionHostProbe: controllerConnection,
port5432Listening: facts.network.port5432Listening, port5432Listening: facts.network.port5432Listening,
serverVersion: facts.postgres.serverVersion, serverVersion: facts.postgres.serverVersion,
secretsOk: secrets.ok, secretsOk: secrets.ok,
dnsRequiredBeforeSub2ApiCutover: true, dnsRequiredBeforeSub2ApiCutover: false,
dnsDisposition: facts.network.dns.resolves ? "optional-alias-resolves" : "optional-alias-unresolved",
}, },
remoteFacts: facts, remoteFacts: facts,
secrets: secretSummary(secrets), secrets: secretSummary(secrets),
@@ -1042,6 +1070,10 @@ function quoteEnv(value: string): string {
return `'${value.replaceAll("'", "'\"'\"'")}'`; return `'${value.replaceAll("'", "'\"'\"'")}'`;
} }
function compactText(value: string): string {
return value.replace(/\s+/gu, " ").trim().slice(0, 500);
}
function fingerprintValues(values: Record<string, string>, keys: string[]): string { function fingerprintValues(values: Record<string, string>, keys: string[]): string {
const material = keys const material = keys
.slice() .slice()
@@ -1127,7 +1159,7 @@ function desiredChecks(pg: PostgresHostConfig, facts: RemoteFacts, secrets: Secr
{ name: "pgdg-repo-release", ok: facts.repo.reachable, releaseUrl: facts.repo.releaseUrl, firstLine: facts.repo.firstLine }, { name: "pgdg-repo-release", ok: facts.repo.reachable, releaseUrl: facts.repo.releaseUrl, firstLine: facts.repo.firstLine },
{ name: "secrets", ok: secrets.ok, entries: secrets.entries.map((entry) => ({ sourceRef: entry.sourceRef, action: entry.action, missingKeys: entry.missingKeys })) }, { name: "secrets", ok: secrets.ok, entries: secrets.entries.map((entry) => ({ sourceRef: entry.sourceRef, action: entry.action, missingKeys: entry.missingKeys })) },
{ name: "pg16-required", ok: pg.postgres.package.version === "16", detail: "Sub2API deployment requires PostgreSQL 16 per issue #281 update." }, { name: "pg16-required", ok: pg.postgres.package.version === "16", detail: "Sub2API deployment requires PostgreSQL 16 per issue #281 update." },
{ name: "dns-db-pikapython", ok: facts.network.dns.resolves, host: facts.network.dns.hostname, addresses: facts.network.dns.addresses, requiredBeforeSub2ApiCutover: true }, { name: "dns-db-pikapython", ok: true, host: facts.network.dns.hostname, addresses: facts.network.dns.addresses, requiredBeforeSub2ApiCutover: false, disposition: facts.network.dns.resolves ? "optional-alias-resolves" : "optional-alias-unresolved" },
{ name: "postgres-tls-required", ok: pg.postgres.network.tls.enabled && pg.postgres.network.sslmode === "require", transport: pg.postgres.network.transport, sslmode: pg.postgres.network.sslmode }, { name: "postgres-tls-required", ok: pg.postgres.network.tls.enabled && pg.postgres.network.sslmode === "require", transport: pg.postgres.network.transport, sslmode: pg.postgres.network.sslmode },
{ name: "remote-pg-hba-hostssl", ok: pg.postgres.auth.pgHba.filter((rule) => rule.type !== "local" && rule.address !== "127.0.0.1/32").every((rule) => rule.type === "hostssl"), detail: "remote PostgreSQL access must use hostssl so plaintext remote connections are refused" }, { name: "remote-pg-hba-hostssl", ok: pg.postgres.auth.pgHba.filter((rule) => rule.type !== "local" && rule.address !== "127.0.0.1/32").every((rule) => rule.type === "hostssl"), detail: "remote PostgreSQL access must use hostssl so plaintext remote connections are refused" },
{ name: "postgres-ssl-runtime", ok: !facts.postgres.packageInstalled || facts.postgres.sslOn, observed: facts.postgres.sslOn, disposition: facts.postgres.packageInstalled ? "must-be-on" : "will-enable-on-apply" }, { name: "postgres-ssl-runtime", ok: !facts.postgres.packageInstalled || facts.postgres.sslOn, observed: facts.postgres.sslOn, disposition: facts.postgres.packageInstalled ? "must-be-on" : "will-enable-on-apply" },
@@ -1151,6 +1183,50 @@ function appProbe(pg: PostgresHostConfig, secrets: SecretInspection | null): { u
return { user: role.name, password, database: database.name }; return { user: role.name, password, database: database.name };
} }
function controllerConnectionProbe(pg: PostgresHostConfig, secrets: SecretInspection): ControllerConnectionProbe {
const host = pg.postgres.network.connectionHost;
const port = pg.postgres.network.port;
const base = { host, port, clientAddr: null, serverAddr: null };
if (!secrets.ok) {
return { ...base, attempted: false, ok: null, ssl: null, psqlVersion: null, error: "secrets-unhealthy" };
}
const probe = appProbe(pg, secrets);
if (probe === null) {
return { ...base, attempted: false, ok: null, ssl: null, psqlVersion: null, error: "probe-secret-material-unavailable" };
}
const version = spawnSync("psql", ["--version"], { encoding: "utf8", timeout: 5_000 });
if (version.error !== undefined || version.status !== 0) {
return {
...base,
attempted: false,
ok: null,
ssl: null,
psqlVersion: null,
error: compactText(version.error?.message ?? version.stderr ?? "psql-unavailable"),
};
}
const conn = `host=${host} port=${port} user=${probe.user} dbname=${probe.database} sslmode=require connect_timeout=8`;
const result = spawnSync("psql", ["-Atq", "-F", "\t", conn, "-c", "SELECT inet_client_addr()::text, inet_server_addr()::text, ssl::text FROM pg_stat_ssl WHERE pid = pg_backend_pid();"], {
encoding: "utf8",
timeout: 12_000,
env: { ...process.env, PGPASSWORD: probe.password },
});
const line = result.stdout.trim().split(/\r?\n/u).find((item) => item.trim().length > 0);
const fields = line?.split("\t") ?? [];
const sslText = (fields[2] ?? "").toLowerCase();
const ssl = ["t", "true", "1", "on"].includes(sslText) ? true : ["f", "false", "0", "off"].includes(sslText) ? false : null;
return {
...base,
attempted: true,
ok: result.status === 0 && ssl === true,
ssl,
clientAddr: fields[0] ?? null,
serverAddr: fields[1] ?? null,
psqlVersion: version.stdout.trim() || null,
error: result.status === 0 ? null : compactText(result.stderr || result.error?.message || "psql-connection-failed"),
};
}
function factsScript(pg: PostgresHostConfig, probe: { user: string; password: string; database: string } | null): string { function factsScript(pg: PostgresHostConfig, probe: { user: string; password: string; database: string } | null): string {
const release = releaseUrl(pg); const release = releaseUrl(pg);
const dataDir = pg.postgres.paths.dataDir; const dataDir = pg.postgres.paths.dataDir;
@@ -1280,6 +1356,7 @@ payload = {
"appConnectionOk": None if text("appConnRc") == "" else text("appConnRc") == "0", "appConnectionOk": None if text("appConnRc") == "" else text("appConnRc") == "0",
"appConnectionSsl": None if text("appSsl") == "" else text("appSsl").lower() in {"t", "true", "1", "on"}, "appConnectionSsl": None if text("appSsl") == "" else text("appSsl").lower() in {"t", "true", "1", "on"},
"appConnectionHost": text("appConnHost") or None, "appConnectionHost": text("appConnHost") or None,
"appConnectionError": text("appConnErr") or None,
}, },
} }
print(json.dumps(payload, ensure_ascii=False, indent=2)) print(json.dumps(payload, ensure_ascii=False, indent=2))