Files
pikasTech-unidesk/docs/reference/platform-infra.md
T

82 lines
12 KiB
Markdown

# G14 Platform Infra
`platform-infra` is the G14 k3s namespace for UniDesk-operated shared platform services. It is separate from HWLAB runtime lanes, AgentRun lanes, D601 user services, and legacy `devops-infra` control-plane helpers. New shared infra should land here first; old `devops-infra` resources migrate gradually only when a concrete owner and validation path exist.
## Source Of Truth
- UniDesk-owned platform configuration must be YAML-first. `config/platform-infra/*.yaml` is the durable source for images, versions, endpoints, FRP exposure, account profile selection, and local consumer configuration.
- Runtime Secrets and local `~/.codex/config.toml*` / `auth.json*` files are inputs or generated local state, not committed truth. CLI output may show Secret paths, byte counts, fingerprints, and short previews only; it must not print complete API keys.
- Code that reads platform YAML must validate object shape, field types, required fields, Kubernetes names, image strings, and ports before mutating G14 k3s or local consumer files.
- Do not hide image versions, namespace names, endpoint URLs, FRP ports, or profile lists in Python/TOML/JSON helper constants when they are UniDesk-owned choices. External tools may still require their own TOML/JSON/env file formats at the edge.
## Sub2API Deployment Boundary
- Sub2API is a G14 platform service operated by UniDesk in namespace `platform-infra`. It is not a HWLAB lane workload, AgentRun workload, D601 service, or master server daemon.
- The canonical deployment entrypoint is `bun scripts/cli.ts platform-infra sub2api plan|apply|status|validate|codex-pool`; daily operation procedures live in `$unidesk-sub2api` at `.agents/skills/unidesk-sub2api/SKILL.md`. This reference keeps only development boundaries and project-specific source-of-truth rules.
- Raw `kubectl` through `trans G14:k3s` is only for bounded diagnosis and evidence, not a formal mutate path.
- The image version is controlled by `config/platform-infra/sub2api.yaml`. Image update procedures are daily operations owned by `$unidesk-sub2api`; the development boundary is that image choices remain YAML-controlled.
- Sub2API should stay ClusterIP-only by default. Do not add Ingress, NodePort, LoadBalancer, or broad FRP exposure unless a YAML-controlled public exposure decision exists.
- Sub2API currently has no resource limits by design. Do not add CPU or memory limits unless a later explicit decision changes that policy and stores the new policy in YAML.
- Master server is a consumer/control host, not the runtime location. Do not deploy Sub2API, PostgreSQL, Redis, or heavy validation loops on master server.
## Codex Pool Routing
`config/platform-infra/sub2api-codex-pool.yaml` controls the Codex-facing OpenAI-compatible pool:
- `pool.groupName` names the Sub2API group that represents the pool.
- `pool.apiKeySecretName` and `pool.apiKeySecretKey` name the k3s Secret that stores the single consumer API key.
- `pool.minOwnerConcurrency` declares the minimum concurrency for the Sub2API user that owns the unified consumer API key. Keep it high enough to cover the declared account capacity set, so the shared key does not fail WS sessions at the user-concurrency layer. Do not compensate for owner-concurrency 1013 errors by pinning capacity to one provider.
- `pool.defaultTempUnschedulable` declares Sub2API account-level temporary unschedulable rules. Keep 429/overload/capacity failures in this YAML policy so the scheduler can cool down a failing account and choose another candidate instead of hard-pinning one provider.
- `profiles.entries` selects local Codex profile files from `~/.codex/` and maps them to Sub2API account names.
- `profiles.entries[].capacity` optionally overrides `pool.defaultAccountCapacity` for one account. Capacity is a YAML-controlled routing input; concrete current values belong only in `config/platform-infra/sub2api-codex-pool.yaml` and runtime validation output, not in long-term reference prose. Code constants, Secrets, ad-hoc runtime patches, or stale tests must not override YAML source of truth.
- Do not change account membership, priority, capacity, WebSocket mode, or other routing policy from inference alone. Unless the user explicitly asks for a configuration change, first preserve the current YAML, collect provenance and runtime evidence, and write the finding to the relevant issue or runbook before proposing a change.
- `profiles.entries[].tempUnschedulable` may override the pool default for one account. The CLI renders it into Sub2API credentials as `temp_unschedulable_enabled` and `temp_unschedulable_rules`; rules match HTTP status plus response-body keywords and place only that account into a temporary unschedulable cooldown.
- `profiles.entries[].openaiResponsesWebSocketsV2Mode` is the account-level Responses WebSocket v2 switch for OpenAI-compatible upstreams that require WebSocket transport. Allowed values are `off`, `ctx_pool`, and `passthrough`; omit the field unless that upstream needs it.
- `profiles.entries[].upstreamUserAgent` is an optional account-level upstream request User-Agent override. Use it only for upstreams that require a Codex CLI compatible User-Agent; keep the value YAML-controlled and newline-free.
- `publicExposure` controls the optional FRP bridge from master server to the G14 ClusterIP service.
- `localCodex` controls how the master server's current `~/.codex` consumer files are backed up and rewritten. Codex consumers using Sub2API must keep `supportsWebSockets` and `responsesWebSocketsV2` enabled so compacted long sessions can continue through the Responses WebSocket v2 response chain instead of falling back to HTTP-only summary context. `localCodex.responsesSmokeModel` is the YAML-declared model used by `codex-pool validate` for the lightweight `POST /v1/responses` smoke.
Enable account-level WebSocket v2 only for upstream profiles that have passed a direct Codex WSv2 probe. Treat this as a YAML-declared capability set, not a hard scheduling pin to one profile; `codex-pool validate` must show at least one current `webSocketsV2.schedulableEnabled` account, and runtime smoke remains the availability proof. The same validation reports each managed account's runtime WebSocket v2 mode and whether it matches YAML, so stale `ctx_pool` settings cannot silently keep routing Codex WS sessions to an upstream that closes with `no available account`, WS handshake 5xx, or before `response.completed`.
When Codex startup repeatedly reports WebSocket reconnects or HTTPS fallback, preserve membership, priority, capacity, and other routing policy until runtime logs identify the failing account and transport. If bounded Sub2API logs show repeated `openai.websocket_proxy_failed` or upstream WS handshake 5xx for one account, remove only that account from the WSv2 capability set in YAML, run `codex-pool sync --confirm`, and prove the result with Codex smoke plus `codex-pool validate`.
Do not encode current availability assumptions in long-term reference prose. If an account needs a higher concurrency than `pool.defaultAccountCapacity`, make that a deliberate YAML override and verify it with `codex-pool validate`; the reference document should describe the rule, not repeat the current numeric value.
Do not enable Sub2API `pool_mode` for UniDesk-managed Codex accounts. `pool_mode` retries the same selected account path, while UniDesk's desired failover behavior is to mark the failing account temporarily unschedulable and let Sub2API choose another account from the group. `codex-pool validate` reports each managed account's temporary-unschedulable runtime alignment and should be used after `codex-pool sync --confirm`. Generic 502 bodies such as `Bad Gateway` and Codex-facing `Upstream request failed` must stay in the YAML cooldown policy so an intermittently bad account is cooled down instead of repeatedly adding latency at the next compact or Responses request. The Codex pool default error cooldown is severity-tiered: temporary signals can start at ten minutes, gateway/service/overload failures should cool down longer, and credential, permission, quota, or account-state failures should use the longest cooldown. Exact current values belong in YAML and runtime validation output.
The request path is:
1. A client sends an OpenAI-compatible request to the configured consumer base URL, normally master-local `http://127.0.0.1:<frp-port>/v1/...`, with the unified API key.
2. master `frps` forwards the TCP connection to `platform-infra/sub2api-frpc` when `publicExposure.enabled` is true.
3. `sub2api-frpc` forwards to `sub2api.platform-infra.svc.cluster.local:8080`.
4. Sub2API validates the unified key and resolves its `group_id`.
5. Accounts listed in `profiles.entries` are bound to the same group via `group_ids`, so Sub2API dispatches through that group using its own account selection semantics.
Adding, removing, exposing, validating, and configuring local Codex consumers are daily operations covered by `$unidesk-sub2api`. The development rule is that ordinary pool membership changes stay YAML-only and do not add code or CI/CD. Code changes are only appropriate when the YAML schema needs a new reusable capability such as account-level WebSocket mode or per-account upstream User-Agent.
After `codex-pool configure-local --confirm`, the default upstream profile must not recursively import the just-created Sub2API consumer endpoint as an upstream account. Keep the default source profile pointed at `config.toml.<backupSuffix>` and `auth.json.<backupSuffix>`; fallback to the current default files is only for first bootstrap before backups exist.
## Public FRP Boundary
When `publicExposure.enabled` is true, the same FRP TCP bridge exposes both OpenAI-compatible API paths and the built-in Sub2API management frontend. The management UI is reachable at the configured `publicExposure.publicBaseUrl` and its `/login` route; do not allocate a second public port unless a separate YAML-controlled exposure decision exists.
The public management UI is an operations endpoint. Keep Sub2API itself in `platform-infra`, keep the Kubernetes Service as ClusterIP, and treat FRP as the only public bridge unless a later decision explicitly changes the exposure model.
## Availability And Probes
Kubernetes readiness is not the same as pool availability:
- The Sub2API app, PostgreSQL, and Redis manifests include container-level health probes. These only prove the pods and local dependencies are healthy enough for Kubernetes scheduling.
- The FRP client deployment is currently a simple connector deployment and does not itself prove that master-local traffic reaches Sub2API.
- No scheduled `CronJob`, `ServiceMonitor`, or `PodMonitor` currently proves the full unified Codex API path.
- `platform-infra sub2api validate` and `platform-infra sub2api codex-pool validate` are on-demand checks. Operational usage is documented in `$unidesk-sub2api`; they are acceptable for deployment closeout, but they are not continuous monitoring. `codex-pool validate` must test both `GET /v1/models` and a small `POST /v1/responses` request, and the Responses smoke should report request id, selected/final account evidence, upstream failover count, and whether the validation succeeded only after failover.
When an automatic availability probe is added, it should be YAML-controlled and cover these layers without printing secrets:
1. G14 in-cluster `GET /v1/models` through `sub2api.platform-infra.svc.cluster.local:8080` with the unified key.
2. master-local `GET /v1/models` through the configured FRP endpoint when public exposure is enabled.
3. A tiny `POST /v1/responses` call through the same consumer URL for true OpenAI-compatible request validation.
4. Optional per-upstream account probes if Sub2API exposes a safe account selection or admin-health mechanism; otherwise document that group-level success does not prove every upstream account is healthy.
Until continuous probing exists, closeout comments must state that validation was on-demand and include the exact CLI/API entrypoints used.