feat: deploy n8n as platform infra service

2026-06-12 17:20:49 +00:00
parent cd025b6c84
commit 91b6c10d3d
8 changed files with 1768 additions and 7 deletions
@@ -73,7 +73,7 @@ CI/CD、GitOps、rollout、artifact 发布、PR 合并后的 runtime lane 滚动
 - `hwlab g14 git-mirror status|apply|sync|flush [--dry-run|--confirm]` 是 `devops-infra` git mirror/relay 的受控维护入口：`apply` 渲染并 server-side apply `devops-infra/git-mirror.yaml`，同时删除遗留 `git-mirror-hwlab-sync` CronJob；`sync` 创建一次性 manual Job，把 GitHub allowlist refs 拉入本地 mirror；`flush` 创建一次性 manual Job，把本地 `v0.2-gitops` 快进推回 GitHub。
  `status` 返回 read/write URL、last sync/write/flush、本地 ref、GitHub staging ref 和 pending flush 状态，并在 `cache.summary` 给出 `localV02`、`localGitops`、`githubGitops`、`pendingFlush`、`flushNeeded`、`githubInSync` 和下一条受控 `flushCommand`。confirmed `sync` 和 `flush` 默认创建 `.state/jobs/` 异步 job 并立刻返回可查询状态，只有现场同步调试才显式加 `--wait`；mirror 不设置 CronJob。
  如果 PipelineRun 的 `gitops-promote` 阶段报 git mirror 控制面漂移或 refs 不一致，先执行 `hwlab g14 git-mirror apply --confirm` 重新应用当前 `devops-infra/git-mirror.yaml` hook/ConfigMap，再执行 `hwlab g14 git-mirror sync --confirm --wait` 复核 refs；失败的同名 PipelineRun 只能通过 `hwlab g14 control-plane cleanup-runs --lane <lane> --pipeline-run <name> --confirm` 受控清理后重试，不要用原生 `kubectl delete` 或手工改 mirror hook。修复后仍必须用 `control-plane status --pipeline-run <name>` 和 `git-mirror status` 分别确认 runtime closeout 与 GitHub flush。
- `platform-infra sub2api plan|apply|status|validate|codex-pool` 是 `platform-infra` namespace 内 Sub2API 的受控入口；`--target` 选择运行目标，默认 `G14` 为 active runtime，`D601` 为同一 YAML 控制的 standby predeploy target。镜像版本和 target 边界由 `config/platform-infra/sub2api.yaml` 控制，Codex 上游池、统一 API key Secret、FRP 公网端口和 master `~/.codex` 消费端由 `config/platform-infra/sub2api-codex-pool.yaml` 控制；完整日常部署、上游增删、FRP 暴露、local Codex 配置、验收和排障步骤统一见 `$unidesk-sub2api`（`.agents/skills/unidesk-sub2api/SKILL.md`）。`docs/reference/platform-infra.md` 只保留 namespace、YAML-first、路由、Secret 脱敏和探针开发边界。
+- `platform-infra sub2api|langbot|n8n ...` 是 `platform-infra` namespace 内平台公共服务的受控入口；`sub2api` 支持 `plan|apply|status|validate|codex-pool`，`langbot` 和 `n8n` 支持各自 YAML-controlled `plan|apply|status|logs|validate` 等公共服务操作。`--target` 选择运行目标，默认 `G14` 为 active runtime，`D601` 为同一 YAML 控制的 standby predeploy target。镜像版本和 target 边界由 `config/platform-infra/*.yaml` 控制，Codex 上游池、统一 API key Secret、FRP 公网端口和 master `~/.codex` 消费端由 `config/platform-infra/sub2api-codex-pool.yaml` 控制；完整 Sub2API 日常部署、上游增删、FRP 暴露、local Codex 配置、验收和排障步骤统一见 `$unidesk-sub2api`（`.agents/skills/unidesk-sub2api/SKILL.md`）。`docs/reference/platform-infra.md` 保留 namespace、YAML-first、路由、Secret 脱敏、PK01 Caddy+FRP 和探针开发边界。
 - `hwlab g14 observability status|apply|query|targets|boundary|closeout [--lane v02] [--promql <expr>] [--expect-count N] [--expect-value V] [--dry-run|--confirm]` 是 G14 `devops-infra` 共享监控基础设施和 HWLAB v0.2 监控 closeout 的受控入口。`apply` 固定安装 Prometheus Operator `v0.91.0`、Prometheus `v3.12.0`、Prometheus 发现 RBAC、`devops-infra` 内 Prometheus 实例和 ClusterIP query Service，并给被允许发现的 workload namespace 打低风险 label；它不把 Prometheus、Grafana 或 Alertmanager 部署到 `hwlab-v02`，也不接管 HWLAB runtime Deployment/Service。`status` 只读汇总 CRD、operator Deployment、Prometheus CR/pod/service、`hwlab-v02` ServiceMonitor/PrometheusRule 和 bounded `up` 查询；`query` 只通过 Kubernetes service proxy 查询 Prometheus，支持 `--expect-count` / `--expect-value` 输出 `assertion`、bad values 和 missing/extra series；`targets` 汇总 ServiceMonitor/PrometheusRule、metrics sidecar readiness/restart、三层指标值和 `metrics.k8s.io` 当前 CPU/内存资源快照；`boundary` 验证 workload namespace 没有 Prometheus/Alertmanager，并对 `19666/19667` 公网 `/metrics` 做负向验证；`closeout` 聚合平台 ready、scrape reachable、sidecar serving、business health probe、resource snapshot、namespace boundary 和 public metrics exposure 语义结论。长期边界见 `docs/reference/g14-observability-infra.md`。
 - `hwlab g14 tools-image status|build --name ci-node-tools --tag <tag> [--dockerfile deploy/ci/hwlab-ci-node-tools.Dockerfile] [--dry-run|--confirm]` 是 G14 固定 HWLAB CI tools image 的受控 host build/push 入口；构建和 push 只发生在 G14 host 与本地 registry，不在 master server 构建，也不把 `apk add`/runtime install 塞进 Tekton PipelineRun。
 - `trans gh:/owner/repo ...` 把 GitHub issue/PR 映射成只读/受控写入的虚拟文本目录，适合日报、PR 正文和 issue 正文的小补丁维护：`trans gh:/pikasTech/HWLAB ls` 展示 `pr/` 与 `issue/`，`trans gh:/pikasTech/HWLAB/pr ls [--limit N] [--full]` 和 `trans gh:/pikasTech/HWLAB/issue ls [--limit N] [--full]` 展示条目状态、楼层数、正文长度和标题，`trans gh:/pikasTech/HWLAB/pr/507 ls` 展示单个 PR 的一楼正文文件，`trans gh:/pikasTech/HWLAB/505/1 cat|rg|patch-apply` 兼容旧式 issue/PR number route。`patch-apply` 使用 UniDesk 默认 apply-patch v2 的虚拟文件 executor，把正文一楼映射为 `body.md`，写回仍走 `bun scripts/cli.ts gh issue/pr update` 的 guard/concurrency 规则；`rm` 对正文一楼结构化拒绝，避免误删 issue/PR 正文。大正文读取必须展开 UniDesk gh dump 文件，否则 `cat/rg/patch-apply` 会误读为空，这是 `gh:` 虚拟文件接口的 P0 可见性契约。
@@ -36,6 +36,15 @@
 - LangBot Box is disabled by default for the public service because the official Box deployment needs Docker socket access. Enabling Box requires a separate explicit platform decision and YAML-controlled security boundary.
 - Official WeChat support is through LangBot's official platform adapters such as `officialaccount`, `wecom`, and `wecomcs`; real AppID, token, EncodingAESKey and channel credentials are bound in LangBot after deployment. Personal WeChat or OpenClaw-style adapters are not part of the default public-service boundary.

+## n8n Workflow Boundary
+
+- n8n is the UniDesk-operated workflow/automation layer for LangBot and platform service integration. It is a workflow bridge for webhook orchestration, service calls, manual approval flows and external integrations; it does not replace LangBot or become the chat runtime.
+- The canonical entrypoint is `bun scripts/cli.ts platform-infra n8n plan|apply|status|logs|validate`; G14 is the default runtime target and `config/platform-infra/n8n.yaml` is the YAML source of truth.
+- n8n uses the existing Pika01/PK01 host-native PostgreSQL instance through `config/platform-db/postgres-pk01.yaml` and `platform-db postgres`. Adding n8n state means adding a dedicated `n8n` database and role inside that single external PostgreSQL instance; do not deploy an in-cluster PostgreSQL StatefulSet, a second PostgreSQL instance, or long-term SQLite state for n8n.
+- Public exposure uses PK01 Caddy plus FRP to the G14 ClusterIP service at `https://n8n.pikapython.com`. Do not add Kubernetes Ingress, NodePort, LoadBalancer, host networking, or host ports for n8n unless a later YAML-controlled platform decision changes the exposure model.
+- n8n reverse-proxy and webhook settings such as public base URL, `WEBHOOK_URL`, proxy hop trust and PostgreSQL connection fields must be rendered from YAML. Secret output may show key names, presence and fingerprints only; it must not print the database password, `N8N_ENCRYPTION_KEY`, or full `DATABASE_URL`.
+- Closeout for public n8n changes requires `platform-infra n8n status` and `platform-infra n8n validate --full`, proving both in-cluster HTTP and public HTTPS. Actual LangBot workflows, credentials and business automations are separate follow-up scope after the base n8n service is healthy.
+
 ## Codex Pool Routing

 `config/platform-infra/sub2api-codex-pool.yaml` controls the Codex-facing OpenAI-compatible pool: