feat(platform-infra): add d601 sub2api standby target

This commit is contained in:
Codex
2026-06-12 04:51:16 +00:00
parent 309e943bd5
commit b774e8d278
5 changed files with 1015 additions and 114 deletions
+1 -1
View File
@@ -68,7 +68,7 @@ CI/CD、GitOps、rollout、artifact 发布、PR 合并后的 runtime lane 滚动
- `hwlab g14 git-mirror status|apply|sync|flush [--dry-run|--confirm]``devops-infra` git mirror/relay 的受控维护入口:`apply` 渲染并 server-side apply `devops-infra/git-mirror.yaml`,同时删除遗留 `git-mirror-hwlab-sync` CronJob`sync` 创建一次性 manual Job,把 GitHub allowlist refs 拉入本地 mirror`flush` 创建一次性 manual Job,把本地 `v0.2-gitops` 快进推回 GitHub。
`status` 返回 read/write URL、last sync/write/flush、本地 ref、GitHub staging ref 和 pending flush 状态,并在 `cache.summary` 给出 `localV02``localGitops``githubGitops``pendingFlush``flushNeeded``githubInSync` 和下一条受控 `flushCommand`。confirmed `sync``flush` 默认创建 `.state/jobs/` 异步 job 并立刻返回可查询状态,只有现场同步调试才显式加 `--wait`mirror 不设置 CronJob。
如果 PipelineRun 的 `gitops-promote` 阶段报 git mirror 控制面漂移或 refs 不一致,先执行 `hwlab g14 git-mirror apply --confirm` 重新应用当前 `devops-infra/git-mirror.yaml` hook/ConfigMap,再执行 `hwlab g14 git-mirror sync --confirm --wait` 复核 refs;失败的同名 PipelineRun 只能通过 `hwlab g14 control-plane cleanup-runs --lane <lane> --pipeline-run <name> --confirm` 受控清理后重试,不要用原生 `kubectl delete` 或手工改 mirror hook。修复后仍必须用 `control-plane status --pipeline-run <name>``git-mirror status` 分别确认 runtime closeout 与 GitHub flush。
- `platform-infra sub2api plan|apply|status|validate|codex-pool` G14 `platform-infra` namespace 内 Sub2API 的受控入口。镜像版本`config/platform-infra/sub2api.yaml` 控制,Codex 上游池、统一 API key Secret、FRP 公网端口和 master `~/.codex` 消费端由 `config/platform-infra/sub2api-codex-pool.yaml` 控制;完整日常部署、上游增删、FRP 暴露、local Codex 配置、验收和排障步骤统一见 `$unidesk-sub2api``.agents/skills/unidesk-sub2api/SKILL.md`)。`docs/reference/platform-infra.md` 只保留 namespace、YAML-first、路由、Secret 脱敏和探针开发边界。
- `platform-infra sub2api plan|apply|status|validate|codex-pool``platform-infra` namespace 内 Sub2API 的受控入口`--target` 选择运行目标,默认 `G14` 为 active runtime`D601` 为同一 YAML 控制的 standby predeploy target。镜像版本和 target 边界`config/platform-infra/sub2api.yaml` 控制,Codex 上游池、统一 API key Secret、FRP 公网端口和 master `~/.codex` 消费端由 `config/platform-infra/sub2api-codex-pool.yaml` 控制;完整日常部署、上游增删、FRP 暴露、local Codex 配置、验收和排障步骤统一见 `$unidesk-sub2api``.agents/skills/unidesk-sub2api/SKILL.md`)。`docs/reference/platform-infra.md` 只保留 namespace、YAML-first、路由、Secret 脱敏和探针开发边界。
- `hwlab g14 observability status|apply|query|targets|boundary|closeout [--lane v02] [--promql <expr>] [--expect-count N] [--expect-value V] [--dry-run|--confirm]` 是 G14 `devops-infra` 共享监控基础设施和 HWLAB v0.2 监控 closeout 的受控入口。`apply` 固定安装 Prometheus Operator `v0.91.0`、Prometheus `v3.12.0`、Prometheus 发现 RBAC、`devops-infra` 内 Prometheus 实例和 ClusterIP query Service,并给被允许发现的 workload namespace 打低风险 label;它不把 Prometheus、Grafana 或 Alertmanager 部署到 `hwlab-v02`,也不接管 HWLAB runtime Deployment/Service。`status` 只读汇总 CRD、operator Deployment、Prometheus CR/pod/service、`hwlab-v02` ServiceMonitor/PrometheusRule 和 bounded `up` 查询;`query` 只通过 Kubernetes service proxy 查询 Prometheus,支持 `--expect-count` / `--expect-value` 输出 `assertion`、bad values 和 missing/extra series`targets` 汇总 ServiceMonitor/PrometheusRule、metrics sidecar readiness/restart、三层指标值和 `metrics.k8s.io` 当前 CPU/内存资源快照;`boundary` 验证 workload namespace 没有 Prometheus/Alertmanager,并对 `19666/19667` 公网 `/metrics` 做负向验证;`closeout` 聚合平台 ready、scrape reachable、sidecar serving、business health probe、resource snapshot、namespace boundary 和 public metrics exposure 语义结论。长期边界见 `docs/reference/g14-observability-infra.md`
- `hwlab g14 tools-image status|build --name ci-node-tools --tag <tag> [--dockerfile deploy/ci/hwlab-ci-node-tools.Dockerfile] [--dry-run|--confirm]` 是 G14 固定 HWLAB CI tools image 的受控 host build/push 入口;构建和 push 只发生在 G14 host 与本地 registry,不在 master server 构建,也不把 `apk add`/runtime install 塞进 Tekton PipelineRun。
- `trans gh:/owner/repo ...` 把 GitHub issue/PR 映射成只读/受控写入的虚拟文本目录,适合日报、PR 正文和 issue 正文的小补丁维护:`trans gh:/pikasTech/HWLAB ls` 展示 `pr/``issue/``trans gh:/pikasTech/HWLAB/pr ls [--limit N] [--full]``trans gh:/pikasTech/HWLAB/issue ls [--limit N] [--full]` 展示条目状态、楼层数、正文长度和标题,`trans gh:/pikasTech/HWLAB/pr/507 ls` 展示单个 PR 的一楼正文文件,`trans gh:/pikasTech/HWLAB/505/1 cat|rg|patch-apply` 兼容旧式 issue/PR number route。`patch-apply` 使用 UniDesk 默认 apply-patch v2 的虚拟文件 executor,把正文一楼映射为 `body.md`,写回仍走 `bun scripts/cli.ts gh issue/pr update` 的 guard/concurrency 规则;`rm` 对正文一楼结构化拒绝,避免误删 issue/PR 正文。大正文读取必须展开 UniDesk gh dump 文件,否则 `cat/rg/patch-apply` 会误读为空,这是 `gh:` 虚拟文件接口的 P0 可见性契约。
+8 -6
View File
@@ -1,6 +1,6 @@
# G14 Platform Infra
# Platform Infra
`platform-infra` is the G14 k3s namespace for UniDesk-operated shared platform services. It is separate from HWLAB runtime lanes, AgentRun lanes, D601 user services, and legacy `devops-infra` control-plane helpers. New shared infra should land here first; old `devops-infra` resources migrate gradually only when a concrete owner and validation path exist.
`platform-infra` is the k3s namespace for UniDesk-operated shared platform services. G14 is the active default runtime for this namespace; D601 may host explicitly declared standby platform targets when the service needs node-local preparation or cutover capacity. It is separate from HWLAB runtime lanes, AgentRun lanes, D601 user services, and legacy `devops-infra` control-plane helpers. New shared infra should land here first; old `devops-infra` resources migrate gradually only when a concrete owner and validation path exist.
## Source Of Truth
@@ -11,13 +11,15 @@
## Sub2API Deployment Boundary
- Sub2API is a G14 platform service operated by UniDesk in namespace `platform-infra`. It is not a HWLAB lane workload, AgentRun workload, D601 service, or master server daemon.
- The canonical deployment entrypoint is `bun scripts/cli.ts platform-infra sub2api plan|apply|status|validate|codex-pool`; daily operation procedures live in `$unidesk-sub2api` at `.agents/skills/unidesk-sub2api/SKILL.md`. This reference keeps only development boundaries and project-specific source-of-truth rules.
- Raw `kubectl` through `trans G14:k3s` is only for bounded diagnosis and evidence, not a formal mutate path.
- Sub2API is a platform service operated by UniDesk in namespace `platform-infra`. It is not a HWLAB lane workload, AgentRun workload, D601 user service, or master server daemon.
- The canonical deployment entrypoint is `bun scripts/cli.ts platform-infra sub2api plan|apply|status|validate|codex-pool`. Runtime targets are selected with `--target`; `G14` is the active default target and `D601` is a standby target controlled by the same YAML. Daily operation procedures live in `$unidesk-sub2api` at `.agents/skills/unidesk-sub2api/SKILL.md`. This reference keeps only development boundaries and project-specific source-of-truth rules.
- Raw `kubectl` through `trans <target>:k3s` is only for bounded diagnosis and evidence, not a formal mutate path.
- The image version is controlled by `config/platform-infra/sub2api.yaml`. Image update procedures are daily operations owned by `$unidesk-sub2api`; the development boundary is that image choices remain YAML-controlled.
- Sub2API should stay ClusterIP-only by default. Do not add Ingress, NodePort, LoadBalancer, or broad FRP exposure unless a YAML-controlled public exposure decision exists.
- Sub2API currently has no resource limits by design. Do not add CPU or memory limits unless a later explicit decision changes that policy and stores the new policy in YAML.
- Master server is a consumer/control host, not the runtime location. Do not deploy Sub2API, PostgreSQL, Redis, or heavy validation loops on master server.
- D601 Sub2API is a predeployment target, not a second active singleton. While the platform database handoff is pending, it must render without a local PostgreSQL StatefulSet, keep the Sub2API app and local Redis cache scaled to zero, and use only ephemeral Redis storage when Redis is later activated. After the external platform DB endpoint, Secret, and runtime images are ready, activation must be expressed by YAML and applied through the same `platform-infra sub2api --target D601` CLI path.
- Sub2API account sentinel and public FRP exposure remain singleton concerns. Do not create a second sentinel or public management surface for D601 unless a later YAML-controlled decision explicitly moves or splits that responsibility.
## Codex Pool Routing
@@ -166,4 +168,4 @@ spec:
This policy must be included in the `sub2api plan` / `apply` manifest rendering so that it is created as part of the normal deployment flow, not maintained as a manual one-off.
`platform-infra sub2api status` must report whether `NetworkPolicy/allow-all` exists and still has `podSelector: {}`, `policyTypes: [Ingress, Egress]`, `ingress: [{}]`, and `egress: [{}]`. `platform-infra sub2api validate` must also run temporary in-namespace probe pods that connect to `sub2api-postgres:5432` and `sub2api-redis:6379`; local `pg_isready` inside the PostgreSQL pod alone is insufficient because it does not exercise kube-router cross-pod policy evaluation.
`platform-infra sub2api status` must report whether `NetworkPolicy/allow-all` exists and still has `podSelector: {}`, `policyTypes: [Ingress, Egress]`, `ingress: [{}]`, and `egress: [{}]`. For active bundled targets, `platform-infra sub2api validate` must also run temporary in-namespace probe pods that connect to `sub2api-postgres:5432` and `sub2api-redis:6379`; local `pg_isready` inside the PostgreSQL pod alone is insufficient because it does not exercise kube-router cross-pod policy evaluation. For external-DB pending standby targets, `validate --target` checks the predeployment shape instead: no local PostgreSQL, app replicas zero, ClusterIP services, allow-all NetworkPolicy, and local Redis declared as ephemeral cache with readiness required only when Redis replicas are above zero.