From c324200ecfa47b07d229cf078fb17f1ddcba06b7 Mon Sep 17 00:00:00 2001 From: Codex Date: Sat, 13 Jun 2026 14:26:15 +0000 Subject: [PATCH] docs: document shared PK01 Caddy blocks --- .agents/skills/unidesk-sub2api/SKILL.md | 3 +++ docs/reference/pk01.md | 8 ++++++-- docs/reference/platform-infra.md | 4 ++++ docs/reference/yaml-first-ops.md | 5 +++++ 4 files changed, 18 insertions(+), 2 deletions(-) diff --git a/.agents/skills/unidesk-sub2api/SKILL.md b/.agents/skills/unidesk-sub2api/SKILL.md index 97613df3..d4e6cf0c 100644 --- a/.agents/skills/unidesk-sub2api/SKILL.md +++ b/.agents/skills/unidesk-sub2api/SKILL.md @@ -174,6 +174,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool expose --confirm - `expose --confirm` 只为 YAML 指定的 `remotePort` 补 master `frps` allow port,并在 G14 创建/更新 `sub2api-frpc`。 - master Caddy site 也由 `publicExposure.masterCaddy` 渲染;`responseHeaderTimeoutSeconds` 必须足够覆盖 Codex `/responses/compact` 长请求,避免 Caddy 先返回 504 而 Sub2API 后台实际稍后成功。具体数值只改 `config/platform-infra/sub2api-codex-pool.yaml`,修改后跑 `codex-pool expose --confirm`,再核对 Caddyfile 中渲染出的 `response_header_timeout`。 - master Caddy 的短窗口边缘重试由 `publicExposure.masterCaddy.edgeRetry` 渲染;用于吸收 FRP remotePort 短暂关闭、`connect: connection refused`、EOF 或 connection reset 这类请求尚未稳定到达 Sub2API 的 502。具体 retry 时长、间隔和 `retryMatch` 范围只写 YAML,修改后跑 `codex-pool expose --confirm`,再核对 Caddyfile 中渲染出的 `lb_try_duration`、`lb_try_interval` 和 `lb_retry_match`。不要手工 patch `/etc/caddy/Caddyfile`。 +- PK01 `/etc/caddy/Caddyfile` 是 Sub2API、LangBot、n8n、HWLAB 等多 YAML 来源共享的 edge artifact。Sub2API apply/expose 只能更新自己的 `# BEGIN unidesk managed sub2api` 块并保留其他 managed blocks;若 apply 输出显示 managed block 数异常,先停止 closeout,检查 PK01 Caddy 合并与 validation 结果,不要手工整文件覆盖。 - 非幂等 POST 的 round-trip retry 必须收窄到 YAML `retryMatch` 声明的安全路径;普通 `/responses` 上游账号错误仍归 Sub2API failover / temp-unschedulable / sentinel 处理,不用 Caddy 重放整段推理请求来掩盖账号池问题。 - 同一个 FRP TCP 入口同时暴露 OpenAI-compatible API 和 Sub2API 管理 UI `/login`。不要另开第二个管理端口,除非 YAML 明确声明新的暴露决策。 - Sub2API Kubernetes Service 继续保持 ClusterIP。 @@ -220,6 +221,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm - 运行中过去的验证探针残留:只用 `codex-pool cleanup-probes --confirm` 清理 `unidesk-probe-*` 临时资源;不要把真实 managed account 删除当作探针清理或可用性恢复。 - FRP 不通:先看 `codex-pool expose --confirm` 输出的 `masterFrps`、`masterCaddy`、`sub2api-frpc` 和 public 401 probe;需要低层证据时只用 `trans G14:k3s` 做 bounded 查询。 - D601 external-active 的 `api.pikapython.com` 不通:先区分 DNS/TLS/Caddy/FRP/Sub2API。DNS 未解析到 YAML 声明的 PK01 地址时,Caddy ACME 会失败,`https://api.pikapython.com` 不能算完成;可用 PK01 loopback FRP 端口和 PK01 公网 remotePort 证明 D601 FRP 数据路径,但最终仍要等 DNS 生效后重跑 HTTPS health、`/v1/models` 和 `/v1/responses`。 +- D601 external-active apply 后其他 PK01 HTTPS 服务消失:优先怀疑共享 Caddy managed block 合并失败或旧整文件写入路径复现。用受控 Sub2API apply 输出和 PK01 Caddy managed block markers 取证,再通过各服务自己的 YAML apply/public-exposure 入口恢复;不要手工复制某一份 Caddyfile 作为长期修复。 - Caddy 下载慢或失败:先确认 `config/platform-infra/sub2api.yaml` 已设置 `publicExposure.pk01.caddyDownloadProxyUrl`,并重跑 `sub2api apply --target D601 --confirm` 看 PK01 apply summary 中的 `downloadProxy.mode=curl-proxy`。不要反复裸连 GitHub release。 - `/responses/compact` 在接近 master Caddy `response_header_timeout` 的固定时长后返回 504,或 Sub2API 日志稍后记录 `codex.remote_compact.succeeded` 时,优先检查 master Caddy `response_header_timeout` 是否由 YAML `publicExposure.masterCaddy.responseHeaderTimeoutSeconds` 渲染,修正后跑 `codex-pool expose --confirm`;这类边缘代理超时不会触发 Sub2API 账号级临时下线。reload 前已经在途的 compact 请求仍可能按旧 timeout 结束,判断修复是否生效时只看 reload 之后新发起的请求。 - `/responses/compact` 或普通 public URL 在几秒窗口内出现 502,Caddy 日志显示 `dial tcp 127.0.0.1:: connect: connection refused`、`EOF` 或 `connection reset by peer`,同时 frps 日志出现 `platform-infra-sub2api proxy closing` / `listener is closed` / `new proxy ... success`,说明失败在 master Caddy 与 FRP remotePort 边缘层,Sub2API 和 sentinel 可能完全看不到。先确认 `publicExposure.masterCaddy.edgeRetry` 已按 YAML 渲染并 `codex-pool expose --confirm` 生效;若仍频繁发生,再继续查 G14 `sub2api-frpc` 到 master `frps` 的控制连接稳定性。不要把这类边缘 502 误判成账号池上游错误,也不要通过禁用账号恢复。 @@ -240,6 +242,7 @@ bun scripts/cli.ts platform-infra sub2api codex-pool configure-local --confirm - 不用原生 `kubectl apply/delete/patch` 作为正式操作入口。 - 不在 master server 部署或运行 Sub2API/PostgreSQL/Redis。 - 不新增 Ingress、NodePort、LoadBalancer、hostPort、hostNetwork 或宽 FRP 端口段。 +- 不用 Sub2API 的 YAML 渲染结果整文件覆盖共享 PK01 Caddyfile;只能通过 managed block merge 更新 Sub2API 自己的块。 - 不给 Sub2API manifest 添加 CPU/memory limits,除非有新的 YAML 化明确决策。 - 不打印完整 API key、admin password 或 Secret 明文。 - 不把普通上游增删做成代码变更、CI/CD、feature flag 或兼容双路径。 diff --git a/docs/reference/pk01.md b/docs/reference/pk01.md index 7fdb8aeb..91d2d5e2 100644 --- a/docs/reference/pk01.md +++ b/docs/reference/pk01.md @@ -58,9 +58,13 @@ PK01 currently hosts existing Docker workloads: `pikanode` mounts `/home/ubuntu/pikanode` read-write into the container. Static/generated download artifacts under `html/download/` and repository data under `files/` may be user-visible or needed by the service. They are not generic GC candidates. -## Sub2API Caddy Edge +## Shared Caddy Edge -PK01 may act as the public Caddy edge for a YAML-declared D601 Sub2API target. The durable source of truth is `config/platform-infra/sub2api.yaml`; do not hand-edit PK01 Caddy or FRP state as a separate routing truth. +PK01 may act as the public Caddy edge for several YAML-declared UniDesk services. The durable source of truth stays in the owning YAML, such as `config/platform-infra/sub2api.yaml`, `config/platform-infra/langbot.yaml`, `config/platform-infra/n8n.yaml`, and the HWLAB node/lane YAML. Do not hand-edit PK01 Caddy or FRP state as a separate routing truth. + +`/etc/caddy/Caddyfile` is a shared artifact. UniDesk-managed writers must update only their own `# BEGIN unidesk managed ` block, preserve other managed blocks, validate the merged file with Caddy before install, and reload Caddy only after validation succeeds. The shared helper is `scripts/src/pk01-caddy.ts`; service CLIs should call it through their owning public-exposure wrapper instead of carrying a private full-file Caddy writer. + +If one public service fails while other services still work, restore the missing route through that service's YAML-controlled apply/public-exposure command, not by replacing the whole Caddyfile. A diagnostic check may list managed block markers with `grep '^# BEGIN unidesk managed ' /etc/caddy/Caddyfile` and run `sudo caddy validate --config /etc/caddy/Caddyfile`, but those checks are evidence only; durable repair belongs to the owning CLI. In this mode, public ports `80` and `443` belong to Caddy. The existing `pikanode` container must be bound to a loopback HTTP port and used only as the apex PikaPython/PikaNode upstream. The `api.pikapython.com` site must reverse proxy directly to the YAML-declared FRP remote port, so API traffic follows `client -> PK01 Caddy -> PK01 frps remote port -> D601 frpc -> D601 Sub2API`. It must not pass through pikanode or a master-server reverse proxy. diff --git a/docs/reference/platform-infra.md b/docs/reference/platform-infra.md index 2777accb..c299d456 100644 --- a/docs/reference/platform-infra.md +++ b/docs/reference/platform-infra.md @@ -173,6 +173,10 @@ The public management UI is an operations endpoint. Keep Sub2API itself in `plat The public bridge has two separate failure classes. Sub2API upstream/account failures are visible in Sub2API logs and currently belong to sentinel quarantine plus normal Sub2API routing among schedulable accounts. Edge failures between Caddy and the FRP remote port are not visible to Sub2API; symptoms include Caddy `connect: connection refused`, EOF, connection reset, TLS/certificate failures, DNS NXDOMAIN, or short 502 bursts while frps closes and reopens the configured remote port. Those failures must be diagnosed from DNS, Caddy, and frps/frpc evidence and mitigated through YAML-controlled Caddy edge retry, DNS correction, or FRP stability fixes, not by disabling accounts or changing pool membership. +PK01 `/etc/caddy/Caddyfile` is a shared edge artifact for multiple YAML owners, including platform-infra services and HWLAB node public exposure. Every platform-infra writer must use the shared managed-block helper in `scripts/src/pk01-caddy.ts` or the platform public-service wrapper around it. The helper preserves existing UniDesk managed blocks, updates only the caller's marker block, validates the merged Caddyfile before install, and reloads Caddy only after validation succeeds. + +Do not render and install a whole PK01 Caddyfile from a single service YAML. Sub2API, LangBot, n8n, HWLAB and future public services must coexist by distinct `# BEGIN unidesk managed ` blocks. A public exposure closeout should verify the service's own public URL and, when the operation touched PK01 Caddy, confirm that unrelated managed blocks are still present or that the apply output reports they were preserved. + ## Availability And Probes Kubernetes readiness is not the same as pool availability: diff --git a/docs/reference/yaml-first-ops.md b/docs/reference/yaml-first-ops.md index b5bb1ac8..3640edc9 100644 --- a/docs/reference/yaml-first-ops.md +++ b/docs/reference/yaml-first-ops.md @@ -91,6 +91,10 @@ App-specific transforms are allowed only as isolated named transform functions. Public exposure must be declared as an edge topology, including DNS expectation, public base URL, bridge settings, edge host route and target service. The existing FRP/Caddy path is a reusable public-service primitive. New public exposure code should extend that primitive instead of adding per-service Caddy or FRP scripts. +When several YAML owners render into the same Caddyfile, each owner must write only its own managed site block and merge it with the existing file. A shared writer must preserve other `# BEGIN unidesk managed ` blocks, remove only legacy unmanaged blocks for the domains owned by the current operation, validate the merged Caddyfile before install, and then reload Caddy. A domain CLI must not replace a shared Caddyfile with a file rendered from its own YAML alone. + +Shared Caddyfile operations belong in a common helper module under `scripts/src/`. Service-specific CLIs should pass YAML-resolved domains, upstreams and marker names to that helper, then report the managed-block counts and validation result. Full-file Caddy installs are allowed only for a host or file that the command exclusively owns and whose exclusivity is documented in the owning reference. + ### Database Blocks External database consumers must reference the YAML-owned platform database source and exported Secret shape. A consumer should not deploy a new database, copy connection strings by hand, or derive credentials from live runtime objects unless the owning database YAML declares that export. @@ -122,6 +126,7 @@ Avoid these patterns: - hard-coding node ids, service ids, namespaces, ports, URLs, Secret names or workload names in code - deriving live state by string conventions when YAML can declare the object directly - keeping repeated `kubectl apply`, Caddy edits, FRP edits or rollout restarts as runbook shell snippets +- replacing a shared Caddyfile from one YAML owner without preserving other managed blocks - printing secret values, complete env files, full `DATABASE_URL` values or reusable API keys - writing long-term docs that duplicate current YAML values as prose - using contract tests or hidden guards to freeze policy values that should remain YAML-controlled