19 KiB
YAML-First Heterogeneous Distributed Ops
This document defines the UniDesk architecture for YAML-first heterogeneous distributed operations. It is the long-term reference for turning node, lane, service, Secret, exposure, database, rollout and probe decisions into declared configuration plus reusable CLI execution. Concrete values belong in YAML under config/; this document defines ownership and architecture only.
Scope
YAML-first ops applies to UniDesk-owned distributed runtime management across heterogeneous targets: host services, Windows-hosted GUI bridge and collector pairs, k3s namespaces, public exposure bridges, external databases, app runtime Secrets, CI/CD control-plane bootstrap, workflow services and managed service probes.
It is not a new global orchestrator. Existing domain ownership stays intact:
- Platform shared services keep their truth in the existing platform infra YAML family.
- Platform database state keeps its truth in platform database YAML.
- Runtime lane services keep their truth in their existing node/lane YAML.
- Agent execution infrastructure keeps its truth in its own infrastructure YAML.
Add a new top-level YAML registry only after multiple existing domains share the same lifecycle, owner and command model, and after the common blocks have already proven reusable. The default path is to extend the owning domain YAML and shared ops helpers, not to create another parallel control plane.
Source Of Truth
UniDesk-owned distributed ops choices must enter through YAML:
- target route and execution plane
- namespace, workload, service, Secret and ConfigMap identifiers
- image references, versions and pull policy
- public URL, DNS expectation, FRP/Caddy edge settings and probe endpoints
- database host, role/database declarations, Secret exports and connection mode
- Secret source references, key mappings, transforms and rollout triggers
- readiness, validation and smoke probe shape
- retention, cadence, timeout and policy values when they are UniDesk-owned choices
Code may validate that YAML is present, typed, syntactically valid and renderable. Code must not become the hidden source for node names, service names, namespaces, ports, image tags, Secret names, URLs, account lists, capacities, cooldowns or retry windows. These values must be read from YAML or from explicit external tool/runtime APIs.
External formats such as JSON, TOML, env files, Kubernetes YAML, Caddyfile, systemd units or app-specific config files may still be generated or consumed at the edge when the external tool requires them. They are inputs or rendered artifacts, not UniDesk desired-state truth.
Service Deployment Declarations
UniDesk-managed service deployment declarations must not live in service repository JSON such as deploy.json. A service repository may keep application source, build inputs, migrations, API/spec documentation and app-native runtime config required by the process. Node/lane selection, runtime namespace, image artifact selection, GitOps branch/path, public exposure, external database wiring, Secret mapping, service account, probes, rollout and cleanup settings belong in the owning UniDesk YAML and are rendered by UniDesk CLI.
Generated GitOps YAML, image catalogs, env files, Kubernetes manifests or external tool config may be committed as rendered artifacts when the runtime requires them. They must carry enough provenance to point back to the owning YAML/source commit and must not become a second editable desired-state truth.
Architecture Layers
YAML-first ops uses five layers.
- Domain YAML
The owning config/**/*.yaml file declares the desired runtime state and all tunable values. A domain YAML may contain reusable blocks such as publicExposure, externalDatabase, runtimeSecrets, rollout, probes, staging, retention or controlPlane, but the exact block is owned by the domain until it is promoted into a shared helper.
- Domain Parser
Each domain has a parser that resolves a selected target and validates only shape, field type, required fields and renderability. It may validate generic syntax such as Kubernetes resource names, route token format, URL shape, image reference shape, relative source references and key names. It must not hard-code current policy values or silently fill business defaults that should live in YAML.
- Common Ops Library
Shared behavior belongs in reusable modules under scripts/src/, not in service-specific command files. The existing reusable seeds are the platform infra public-service helpers and the platform infra ops library. New common helpers should be extracted when the same operation appears in more than one domain, especially for:
- route execution and bounded capture
- YAML parsing primitives
- redacted output, fingerprints and compact evidence
- Secret source loading and source path redaction
- Kubernetes Secret apply from local source material
- rollout restart/status from YAML-declared workload refs
- public exposure rendering through FRP/Caddy
- manifest staging, dry-run and server-side apply wrappers
- probe execution and response summarization
- async job submission and short polling for long operations
- Thin Domain CLI
The domain CLI resolves the target from YAML, calls shared helpers and prints structured JSON. It should not contain large inline shell bodies, duplicated secret-sync scripts, hard-coded service names or app-specific operational workflows. A domain CLI may keep a stable command namespace for compatibility and discoverability, but the implementation should delegate to common helpers.
- Runtime Executor
Runtime mutation goes through UniDesk CLI and trans route execution. Direct kubectl, raw SSH, hand-written Caddy edits, direct GitHub API calls or ad hoc shell scripts may be diagnostic or emergency recovery tools only. Repeated operational writes must be promoted into a controlled CLI command that reads YAML and reports redacted structured output.
Host Bootstrap And Zero-Dependency Proxy
Fresh VPS bootstrap has one allowed host-level exception before the node can become a clean k3s runtime: a zero-dependency host proxy client distributed and configured by YAML-first ops through trans. The client must not require Docker, k3s, containerd, Node, Python, Bun, Go, Rust, or package-manager network access to install. Its purpose is to provide outbound connectivity for those dependencies, including k3s installer downloads, container image pulls, apk, apt, npm, Go modules, Git and git-mirror sync/flush.
The proxy source, upstream server identity, local listen address, local port, systemd unit, environment injection targets, health probe and secret source references must be declared in YAML. The target node should receive only rendered artifacts and redacted summaries over trans; it must not fetch the proxy client from the public internet before the proxy exists. Values such as proxy server host, credentials, bind port and benchmark profile belong in YAML or Secret source files, not in prose or code defaults.
The default bootstrap order for a new node is:
- establish provider-gateway only long enough to make
transreachable; - distribute the zero-dependency proxy client through
trans; - start and validate the host proxy client from YAML-rendered systemd or equivalent host control;
- configure host egress for k3s/containerd/package managers and Git through that proxy;
- install k3s and deploy workloads through YAML-first control paths;
- keep later workload proxy settings as YAML-declared consumers of the host route unless a domain explicitly owns a different proxy boundary.
For fresh VPS bring-up, do not use provider-gateway WebSocket egress as the bulk path for large installers, images or dependency downloads. Provider-gateway and trans are the bootstrap control channel; the host proxy client is the data egress path once installed.
Host proxy use has a privacy boundary. A plain http_proxy=http://... proxy can observe destination hosts, request metadata and any proxy credential embedded in the URL or process environment; TLS protects encrypted request bodies from the proxy but does not hide the destination metadata from the proxy operator. Do not place private API keys in proxy URLs, logs or CLI output. NO_PROXY/no_proxy must preserve local, cluster, metadata and explicitly declared direct domains, including hyueapi.com and .hyueapi.com for Codex direct API access.
Cross-Repository Operational Truth
When UniDesk prepares or supervises runtime data for another repository, the owning YAML still belongs in UniDesk if the data is part of node/lane operations rather than that repository's application source. Examples include HWLAB test/admin account inventories, operator-only API key source references, runtime DB sync inputs, public exposure targets and cross-node workbench preparation. The service repository may contain application code and app-native migrations, but it must not become a second desired-state truth for UniDesk-operated accounts, credentials, nodes, lanes, namespaces or sourceRef bindings.
Cross-repository YAML must name the target repository/runtime explicitly and must route all writes through a UniDesk CLI entrypoint that prints only sourceRef, targetKey, presence, byte counts, fingerprints, object ids and mutation summaries. The CLI may read owner-only local source files or declared platform DB exports; it must not backfill those files from runtime Secrets, pod env, logs or database rows.
Common Block Rules
Reusable blocks must describe operations in data, not in service-specific code branches.
Target Blocks
A target block should declare the route, execution plane, namespace and any workload refs required by the operation. Code must not infer these from a node id, lane id or service id by concatenating strings unless that concatenation rule itself is explicitly declared and stable for the domain.
Secret Blocks
A runtime Secret block should declare source reference, source key, target Secret, target key, optional transform and rollout trigger. Secret values must stay in git-ignored owner-only source files or external Secret stores. CLI output may show sourceRef, target object names, key names, presence, byte counts, fingerprints, mutation and next commands; it must not print secret values, full tokens, decoded base64, passwords or complete connection strings.
App-specific transforms are allowed only as isolated named transform functions. The transform name is data in YAML; the implementation belongs in a shared transform registry or a small domain adapter, not in a one-off reset command.
Exposure Blocks
Public exposure must be declared as an edge topology, including DNS expectation, public base URL, bridge settings, edge host route and target service. The existing FRP/Caddy path is a reusable public-service primitive. New public exposure code should extend that primitive instead of adding per-service Caddy or FRP scripts.
When several YAML owners render into the same Caddyfile, each owner must write only its own managed site block and merge it with the existing file. A shared writer must preserve other # BEGIN unidesk managed <owner> blocks, remove only legacy unmanaged blocks for the domains owned by the current operation, validate the merged Caddyfile before install, and then reload Caddy. A domain CLI must not replace a shared Caddyfile with a file rendered from its own YAML alone.
Shared Caddyfile operations belong in a common helper module under scripts/src/. Service-specific CLIs should pass YAML-resolved domains, upstreams and marker names to that helper, then report the managed-block counts and validation result. Full-file Caddy installs are allowed only for a host or file that the command exclusively owns and whose exclusivity is documented in the owning reference.
Database Blocks
External database consumers must reference the YAML-owned platform database source and exported Secret shape. A consumer should not deploy a new database, copy connection strings by hand, or derive credentials from live runtime objects unless the owning database YAML declares that export.
Probe Blocks
Probes are validation data, not hidden policy. YAML should declare what endpoint or runtime object proves the operation for that service. CLI code may execute the probe, bound output and classify failure, but should not hard-code current URLs, credentials, namespaces or service paths.
Reuse, Composition And Variable Rendering
YAML-first does not mean one ever-growing file per node. Reusable configuration should be expressed as small owning YAML documents that can be inherited, composed or referenced by configRefs. A node-specific file should normally select a baseline, provide only the concrete overrides for that node, and reference shared blocks for proxy, public exposure, Secret distribution, source truth, runtime dependencies, probes and rollout policy.
Config composition must keep one durable source of truth for each fact. Rendered plans may merge defaults, baselines and overrides for execution, but the merged object must not be written back as a second editable truth. CLI plan/status --full should show the reference chain, effective owner, variable inputs, redacted source fingerprints and missing fields instead of dumping all expanded YAML by default.
Variable rendering is allowed for node/lane/service-specific artifact generation. Node and lane identifiers must be data variables, not part of YAML filenames, parser function names, renderer names, TypeScript helper names or long-term schema names. For example, use a neutral template plus NODE/LANE variables instead of adding files or functions named after a concrete node. Concrete node names may appear only as YAML values, CLI parameters, rendered runtime object names where the external system needs them, or legacy adapters that are explicitly classified.
When a configuration area starts to repeat whole blocks across nodes, treat that as a design smell. Promote the common block into a baseline or shared reference before adding more node copies, and keep node overrides short enough to review. If the owning YAML still grows without a clear owner boundary, split by responsibility: bootstrap host dependencies, proxy, k3s install, platform services, app lane, public exposure, Secret sourceRefs, probes and sentinel schedules should not be forced into one file merely because they target the same node.
Refactoring Rule
When adding YAML-first ops to an existing domain, follow this order:
- Inventory the existing YAML, CLI commands and helper modules.
- Choose the owning domain YAML; do not start with a new global registry.
- Add or refine a reusable block in that YAML with all concrete values declared there.
- Extend the domain parser with shape/type/renderability validation only.
- Extract common execution into shared helper modules before adding domain-specific code.
- Keep the domain CLI as a thin adapter over the common helper.
- Validate with the narrowest syntax check and command-shape or original-entry runtime check required by the change.
Large domain command files must be split by responsibility before receiving more operational logic. Typical split boundaries are target resolution, manifest rendering, Secret sync, public exposure, database bridge, rollout, probes, cleanup and status summarization.
Finite Governance Slices
YAML-first cleanup work must not become an open-ended sequence of rounds. Once an issue is used to close a broad normalization area, freeze a bounded phase list before implementation and keep all child issues inside that list. New findings after the freeze must be classified into the existing scope, kept as domain-specific differences, or parked out of scope; they must not create another phase merely because a search found more candidates.
A shared helper extraction stops when the repeated mechanism has a stable helper, the remaining differences are true domain behavior, or the remaining candidates are outside the frozen scope. Do not continue extracting only to make every domain file look identical. The final audit should list completed changes, kept domain differences, parked risks and validation evidence, then close the issue instead of opening a follow-up round by default.
Closed governance issues are historical records, not standing queues. A later YAML-first pass must start from a fresh code/config inventory and a new bounded issue instead of reopening or appending phases to an already closed issue. For broad cleanup, the phase list should stay small and fixed before implementation; do not exceed five phases unless the user explicitly requests a different bound.
Each frozen phase must deliver a concrete code, configuration or user-facing CLI behavior change. Validation commands, evidence gathering, issue comments, merge/preflight work, progress reporting and docs-only distillation are closeout activities; they may be recorded, but they are not phases. When all frozen phases are complete, close the parent issue and classify remaining discoveries as in-scope leftovers already handled, true domain differences, or out-of-scope parked risks.
Anti-Patterns
Avoid these patterns:
- creating a per-service reset script when a YAML-declared Secret sync plus rollout block is enough
- adding a second control plane for a service that already has an owning YAML and CLI namespace
- hard-coding node ids, service ids, namespaces, ports, URLs, Secret names or workload names in code
- deriving live state by string conventions when YAML can declare the object directly
- keeping repeated
kubectl apply, Caddy edits, FRP edits or rollout restarts as runbook shell snippets - replacing a shared Caddyfile from one YAML owner without preserving other managed blocks
- printing secret values, complete env files, full
DATABASE_URLvalues or reusable API keys - writing long-term docs that duplicate current YAML values as prose
- using contract tests or hidden guards to freeze policy values that should remain YAML-controlled
- preserving legacy command branches after the latest YAML-first path supersedes them
- extending a frozen cleanup issue by appending new rounds instead of classifying discoveries as in-scope, domain-specific or parked
- treating validation, evidence collection, issue lifecycle work, or docs-only closeout as implementation phases
Documentation Boundary
Long-term references should point to this architecture for common YAML-first ops rules, then document only domain-specific ownership and entrypoints. They should not repeat common Secret, exposure, target, redaction or no-hardcoding rules unless a domain adds a stricter constraint.
When a recurring operation becomes stable, update the owning reference document and the relevant skill with the domain entrypoint and decision boundary. Do not document one-off manual recovery as the standard path; manual repair remains recovery evidence until the YAML and CLI path exists.