docs: reference sub2api operations skill
This commit is contained in:
@@ -9,7 +9,7 @@ UniDesk 是一个以主 server 为统一入口的分布式工作平台;本文
|
||||
|
||||
## P0 最高优先级:G14 platform-infra 规则
|
||||
|
||||
- P0: `platform-infra` 是 G14 k3s 上 UniDesk 运维的平台基础设施 namespace;Sub2API、Codex pool、FRP 暴露、统一消费 API key 和后续平台基础设施迁移的长期边界、路由与探针口径统一见 `docs/reference/platform-infra.md`。
|
||||
- P0: `platform-infra` 是 G14 k3s 上 UniDesk 运维的平台基础设施 namespace;Sub2API、Codex pool、FRP 暴露、统一消费 API key 和后续平台基础设施迁移的长期边界、路由与探针口径统一见 `docs/reference/platform-infra.md`,Sub2API 日常操作统一见 `$unidesk-sub2api`(`~/.agents/skills/unidesk-sub2api/SKILL.md`)。
|
||||
- P0: `devops-infra` 仅作为既有控制面基础设施逐步迁移来源,不再作为新增平台服务的默认 namespace;新增/迁移必须优先落到 `platform-infra`,并通过 `config/platform-infra/*.yaml` 与 `bun scripts/cli.ts platform-infra ...` 受控。
|
||||
|
||||
## P0 最高优先级:CaseRun 无服务与单步调试规则
|
||||
@@ -267,7 +267,7 @@ UniDesk 是一个以主 server 为统一入口的分布式工作平台;本文
|
||||
- `docs/reference/code-queue-supervision.md`:Code Queue 居中调度、并发队列拆分、运行中监控、基础设施缺陷分流和验收收口规则。
|
||||
- `docs/reference/hwlab.md`:HWLAB 指挥侧固定 workspace、G14 主运行面、D601 legacy/硬件桥接边界、最小 device-agent/gateway 桥接模型和受控发布边界。
|
||||
- `docs/reference/g14.md`:G14 provider 节点、k3s 控制桥、legacy DEV/PROD 退役边界、当前 HWLAB runtime lane、device-agent 手动实验边界、Code Queue/CI 候选目标和节点本地 VPN proxy bootstrap 边界。
|
||||
- `docs/reference/platform-infra.md`:G14 `platform-infra` namespace、YAML-first shared service 配置、Sub2API/Codex pool、FRP 暴露和 on-demand availability probe 边界。
|
||||
- `docs/reference/platform-infra.md`:G14 `platform-infra` namespace、YAML-first shared service 配置、Sub2API/Codex pool、FRP 暴露和 on-demand availability probe 开发边界;Sub2API 日常操作统一见 `$unidesk-sub2api`(`~/.agents/skills/unidesk-sub2api/SKILL.md`)。
|
||||
- `docs/reference/g14-observability-infra.md`:G14 原生 k3s 上 Prometheus Operator、`devops-infra` 监控基础设施、跨 namespace scrape 声明和安全边界。
|
||||
- `docs/reference/gc.md`:UniDesk 主 server 和 provider 磁盘 GC、G14/HWLAB registry retention、safe-stop 线和长期防膨胀收益规则。
|
||||
- `docs/reference/observability.md`:服务日志、任务活性、通用性能指标 API 和性能面板的可观测性规则。
|
||||
|
||||
@@ -68,7 +68,7 @@ CI/CD、GitOps、rollout、artifact 发布、PR 合并后的 runtime lane 滚动
|
||||
- `hwlab g14 git-mirror status|apply|sync|flush [--dry-run|--confirm]` 是 `devops-infra` git mirror/relay 的受控维护入口:`apply` 渲染并 server-side apply `devops-infra/git-mirror.yaml`,同时删除遗留 `git-mirror-hwlab-sync` CronJob;`sync` 创建一次性 manual Job,把 GitHub allowlist refs 拉入本地 mirror;`flush` 创建一次性 manual Job,把本地 `v0.2-gitops` 快进推回 GitHub。
|
||||
`status` 返回 read/write URL、last sync/write/flush、本地 ref、GitHub staging ref 和 pending flush 状态,并在 `cache.summary` 给出 `localV02`、`localGitops`、`githubGitops`、`pendingFlush`、`flushNeeded`、`githubInSync` 和下一条受控 `flushCommand`。confirmed `sync` 和 `flush` 默认创建 `.state/jobs/` 异步 job 并立刻返回可查询状态,只有现场同步调试才显式加 `--wait`;mirror 不设置 CronJob。
|
||||
如果 PipelineRun 的 `gitops-promote` 阶段报 git mirror 控制面漂移或 refs 不一致,先执行 `hwlab g14 git-mirror apply --confirm` 重新应用当前 `devops-infra/git-mirror.yaml` hook/ConfigMap,再执行 `hwlab g14 git-mirror sync --confirm --wait` 复核 refs;失败的同名 PipelineRun 只能通过 `hwlab g14 control-plane cleanup-runs --lane <lane> --pipeline-run <name> --confirm` 受控清理后重试,不要用原生 `kubectl delete` 或手工改 mirror hook。修复后仍必须用 `control-plane status --pipeline-run <name>` 和 `git-mirror status` 分别确认 runtime closeout 与 GitHub flush。
|
||||
- `platform-infra sub2api plan|apply|status|validate|codex-pool` 是 G14 `platform-infra` namespace 内 Sub2API 的受控入口。镜像版本由 `config/platform-infra/sub2api.yaml` 控制,Codex 上游池、统一 API key Secret、FRP 公网端口和 master `~/.codex` 消费端由 `config/platform-infra/sub2api-codex-pool.yaml` 控制;`codex-pool sync --confirm` 从 YAML 指定的 `~/.codex/config.toml*`/`auth.json*` 导入 YAML 声明的上游账号到同一 Sub2API group,统一消费 key 固定存放在 `platform-infra/sub2api-codex-pool-api-key.API_KEY`。`profiles.entries[].openaiResponsesWebSocketsV2Mode` 用于对接需要 Responses WebSocket v2 的 OpenAI-compatible 上游,`profiles.entries[].upstreamUserAgent` 用于少数需要 Codex CLI User-Agent 的上游;新增、调整或移除上游优先改 YAML 后 `codex-pool sync --confirm`,sync 会删除 YAML 中已移除且带 `unidesk_managed=true` 的 `unidesk-codex-*` account,不走 CI/CD。`codex-pool expose --confirm` 只新增明确 FRPS allow port 和 `platform-infra/sub2api-frpc`,不得扩大端口段;`codex-pool configure-local --confirm` 先把当前 `~/.codex/config.toml`、`auth.json` 备份成 `*.pre-sub2api`,再把默认 provider 指向 master 本机 FRP 入口。`validate`/`codex-pool validate` 是按需验收,不是连续可用性探针;完整 namespace 边界、路由和探针口径见 `docs/reference/platform-infra.md`。所有输出只允许打印 key preview/fingerprint、字节数和 Secret 路径,不打印完整 API key。
|
||||
- `platform-infra sub2api plan|apply|status|validate|codex-pool` 是 G14 `platform-infra` namespace 内 Sub2API 的受控入口。镜像版本由 `config/platform-infra/sub2api.yaml` 控制,Codex 上游池、统一 API key Secret、FRP 公网端口和 master `~/.codex` 消费端由 `config/platform-infra/sub2api-codex-pool.yaml` 控制;完整日常部署、上游增删、FRP 暴露、local Codex 配置、验收和排障步骤统一见 `$unidesk-sub2api`(`~/.agents/skills/unidesk-sub2api/SKILL.md`)。`docs/reference/platform-infra.md` 只保留 namespace、YAML-first、路由、Secret 脱敏和探针开发边界。
|
||||
- `hwlab g14 observability status|apply|query|targets|boundary|closeout [--lane v02] [--promql <expr>] [--expect-count N] [--expect-value V] [--dry-run|--confirm]` 是 G14 `devops-infra` 共享监控基础设施和 HWLAB v0.2 监控 closeout 的受控入口。`apply` 固定安装 Prometheus Operator `v0.91.0`、Prometheus `v3.12.0`、Prometheus 发现 RBAC、`devops-infra` 内 Prometheus 实例和 ClusterIP query Service,并给被允许发现的 workload namespace 打低风险 label;它不把 Prometheus、Grafana 或 Alertmanager 部署到 `hwlab-v02`,也不接管 HWLAB runtime Deployment/Service。`status` 只读汇总 CRD、operator Deployment、Prometheus CR/pod/service、`hwlab-v02` ServiceMonitor/PrometheusRule 和 bounded `up` 查询;`query` 只通过 Kubernetes service proxy 查询 Prometheus,支持 `--expect-count` / `--expect-value` 输出 `assertion`、bad values 和 missing/extra series;`targets` 汇总 ServiceMonitor/PrometheusRule、metrics sidecar readiness/restart、三层指标值和 `metrics.k8s.io` 当前 CPU/内存资源快照;`boundary` 验证 workload namespace 没有 Prometheus/Alertmanager,并对 `19666/19667` 公网 `/metrics` 做负向验证;`closeout` 聚合平台 ready、scrape reachable、sidecar serving、business health probe、resource snapshot、namespace boundary 和 public metrics exposure 语义结论。长期边界见 `docs/reference/g14-observability-infra.md`。
|
||||
- `hwlab g14 tools-image status|build --name ci-node-tools --tag <tag> [--dockerfile deploy/ci/hwlab-ci-node-tools.Dockerfile] [--dry-run|--confirm]` 是 G14 固定 HWLAB CI tools image 的受控 host build/push 入口;构建和 push 只发生在 G14 host 与本地 registry,不在 master server 构建,也不把 `apk add`/runtime install 塞进 Tekton PipelineRun。
|
||||
- `trans gh:/owner/repo ...` 把 GitHub issue/PR 映射成只读/受控写入的虚拟文本目录,适合日报、PR 正文和 issue 正文的小补丁维护:`trans gh:/pikasTech/HWLAB ls` 展示 `pr/` 与 `issue/`,`trans gh:/pikasTech/HWLAB/pr ls [--limit N] [--full]` 和 `trans gh:/pikasTech/HWLAB/issue ls [--limit N] [--full]` 展示条目状态、楼层数、正文长度和标题,`trans gh:/pikasTech/HWLAB/pr/507 ls` 展示单个 PR 的一楼正文文件,`trans gh:/pikasTech/HWLAB/505/1 cat|rg|patch-apply` 兼容旧式 issue/PR number route。`patch-apply` 使用 UniDesk 默认 apply-patch v2 的虚拟文件 executor,把正文一楼映射为 `body.md`,写回仍走 `bun scripts/cli.ts gh issue/pr update` 的 guard/concurrency 规则;`rm` 对正文一楼结构化拒绝,避免误删 issue/PR 正文。大正文读取必须展开 UniDesk gh dump 文件,否则 `cat/rg/patch-apply` 会误读为空,这是 `gh:` 虚拟文件接口的 P0 可见性契约。
|
||||
|
||||
@@ -12,8 +12,9 @@
|
||||
## Sub2API Deployment Boundary
|
||||
|
||||
- Sub2API is a G14 platform service operated by UniDesk in namespace `platform-infra`. It is not a HWLAB lane workload, AgentRun workload, D601 service, or master server daemon.
|
||||
- The canonical deployment entrypoint is `bun scripts/cli.ts platform-infra sub2api plan|apply|status|validate|codex-pool`; raw `kubectl` through `trans G14:k3s` is only for bounded diagnosis and evidence.
|
||||
- The image version is controlled by `config/platform-infra/sub2api.yaml`. Updating the image must be a YAML change plus `platform-infra sub2api apply --confirm` and follow-up runtime validation.
|
||||
- The canonical deployment entrypoint is `bun scripts/cli.ts platform-infra sub2api plan|apply|status|validate|codex-pool`; daily operation procedures live in `$unidesk-sub2api` at `~/.agents/skills/unidesk-sub2api/SKILL.md`. This reference keeps only development boundaries and project-specific source-of-truth rules.
|
||||
- Raw `kubectl` through `trans G14:k3s` is only for bounded diagnosis and evidence, not a formal mutate path.
|
||||
- The image version is controlled by `config/platform-infra/sub2api.yaml`. Image update procedures are daily operations owned by `$unidesk-sub2api`; the development boundary is that image choices remain YAML-controlled.
|
||||
- Sub2API should stay ClusterIP-only by default. Do not add Ingress, NodePort, LoadBalancer, or broad FRP exposure unless a YAML-controlled public exposure decision exists.
|
||||
- Sub2API currently has no resource limits by design. Do not add CPU or memory limits unless a later explicit decision changes that policy and stores the new policy in YAML.
|
||||
- Master server is a consumer/control host, not the runtime location. Do not deploy Sub2API, PostgreSQL, Redis, or heavy validation loops on master server.
|
||||
@@ -38,7 +39,7 @@ The request path is:
|
||||
4. Sub2API validates the unified key and resolves its `group_id`.
|
||||
5. Accounts listed in `profiles.entries` are bound to the same group via `group_ids`, so Sub2API dispatches through that group using its own account selection semantics.
|
||||
|
||||
Adding an upstream should be a fast YAML operation: create the corresponding local `~/.codex/config.toml.<profile>` and `auth.json.<profile>` inputs, add one `profiles.entries` item, then run `platform-infra sub2api codex-pool plan|sync --confirm`. Removing an upstream is also a YAML operation: remove the `profiles.entries` item, then run `sync --confirm`; the sync path deletes only matching `unidesk-codex-*` accounts that are marked `unidesk_managed=true` and absent from YAML. Do not add code or CI/CD for ordinary pool membership changes. Code changes are only appropriate when the YAML schema needs a new reusable capability such as account-level WebSocket mode or per-account upstream User-Agent.
|
||||
Adding, removing, exposing, validating, and configuring local Codex consumers are daily operations covered by `$unidesk-sub2api`. The development rule is that ordinary pool membership changes stay YAML-only and do not add code or CI/CD. Code changes are only appropriate when the YAML schema needs a new reusable capability such as account-level WebSocket mode or per-account upstream User-Agent.
|
||||
|
||||
After `codex-pool configure-local --confirm`, the default upstream profile must not recursively import the just-created Sub2API consumer endpoint as an upstream account. Keep the default source profile pointed at `config.toml.<backupSuffix>` and `auth.json.<backupSuffix>`; fallback to the current default files is only for first bootstrap before backups exist.
|
||||
|
||||
@@ -55,7 +56,7 @@ Kubernetes readiness is not the same as pool availability:
|
||||
- The Sub2API app, PostgreSQL, and Redis manifests include container-level health probes. These only prove the pods and local dependencies are healthy enough for Kubernetes scheduling.
|
||||
- The FRP client deployment is currently a simple connector deployment and does not itself prove that master-local traffic reaches Sub2API.
|
||||
- No scheduled `CronJob`, `ServiceMonitor`, or `PodMonitor` currently proves the full unified Codex API path.
|
||||
- `platform-infra sub2api validate` and `platform-infra sub2api codex-pool validate` are on-demand checks. They are acceptable for deployment closeout, but they are not continuous monitoring.
|
||||
- `platform-infra sub2api validate` and `platform-infra sub2api codex-pool validate` are on-demand checks. Operational usage is documented in `$unidesk-sub2api`; they are acceptable for deployment closeout, but they are not continuous monitoring.
|
||||
|
||||
When an automatic availability probe is added, it should be YAML-controlled and cover these layers without printing secrets:
|
||||
|
||||
|
||||
Reference in New Issue
Block a user