pikasTech-unidesk/docs/reference/arch.md

- Requirements
  - Build a distributed work platform covering research, project development, and project management
  - Deploy the main entry point on a server with a public IP, providing a unified interface
  - Multiple computing resource machines join the platform to execute computing tasks
  - The platform must support task scheduling, state monitoring, versioned code distribution, and large file storage
  - Design goals are high availability, high concurrency, centralized state management, and stateless compute nodes
- Key Assumptions
  - The main server has a public IP and can be accessed from the internet
  - Computing resource machines have no public IP, possibly behind NAT or firewalls
  - Computing resource machines have stable outbound network connectivity (within intranet or internet)
  - Computing resource machines can run Docker and support WSL (some nodes are Windows workstations)
  - Users interact with the platform only through the main server entry point, never directly with compute nodes
  - The main server's availability is higher than that of computing resource machines; compute nodes may go offline frequently due to hardware, network, or human factors
  - Tasks prone to single points of failure are deployed on the main server first, leveraging its high-availability environment to protect the critical path
- UniDesk Distributed Work Platform Architecture
  - Overview
    - The main server hosts all stateless business logic as the unified entry point
    - Computing resource nodes actively connect via lightweight Provider Gateway containers
    - All state is stored centrally in PostgreSQL, never scattered across nodes
    - Code and environments are distributed via GitHub versions; large file storage solution is to be determined
    - The main server also connects itself to the platform as a compute node, using the exact same method as ordinary compute nodes
    - This design allows verification of the full distributed dispatching flow on a single main server
  - Main Server Components
    - UniDesk Stateless Services
      - Run all user services as Docker containers; these user-facing services are mounted onto the UniDesk core and the core can still run without them
      - Includes frontend gateway, task scheduler, project management, provider ingress, and other stateless modules
      - Instances can scale horizontally; failure recovery requires no state synchronization
      - Only the production frontend gateway, dev frontend proxy and provider ingress are unrestricted public entries; core REST APIs and PostgreSQL remain on the Docker internal network or explicitly restricted host mappings. The dev frontend proxy rule is owned by `docs/reference/dev-environment.md`.
    - Frontend Time Zone Policy
      - All UniDesk frontend timestamps, dates, clocks, update times, heartbeat times, Trace times, Gantt axis labels, export date stamps, and `datetime-local` values must render as Beijing time.
      - Beijing time means IANA timezone `Asia/Shanghai` / UTC+8, regardless of the browser timezone, host system timezone, container timezone, or server-side `project.timezone` value.
      - Frontend code must use the shared formatter and input conversion helpers in `src/components/frontend/src/time.ts`; raw ISO/UTC timestamps may appear only inside explicitly opened raw JSON views.
    - PostgreSQL Database
      - Deployed as a Docker container with a 10 GB named volume
      - Stores all task metadata, node heartbeats, resource labels, and business state
      - Backed up periodically via `pg_dump`, keeping the last 7 daily snapshots
      - The named volume ensures data survives container recreation or upgrades
  - Code and Environment Distribution
    - Code repositories and execution environment definitions may reside in multiple GitHub repositories
    - When dispatching a task, five metadata items must be specified: `code_repo_url`, `code_commit_id`, `env_repo_url`, `env_commit_id`, and `dockerfile_path`
    - A single env repo can contain multiple Dockerfiles defining different execution environments, distinguished by `dockerfile_path`
    - Compute nodes maintain a local Git cache and only incrementally fetch the specified version each time
    - Docker layer caching accelerates environment builds, making subsequent builds nearly instantaneous after the first
  - Compute Node Connection Scheme
    - Provider Gateway Docker
      - Each computing resource machine runs a Provider Gateway container
      - Acts as the node-side gateway, bridging the main server and the local execution environment
      - The container houses the agent logic, implementing a WebSocket client and local scheduling
    - WebSocket Persistent Connection
      - Provider Gateway actively initiates a WebSocket connection to the main server
      - Commands, heartbeats, and task statuses are exchanged bidirectionally over this persistent connection
      - The main server never initiates connections to nodes, perfectly adapting to environments without public IP and behind NAT
    - Interaction with Local Execution Environment
      - The primary path for automated task dispatching and execution is via the local Docker socket
      - Access to the local environment via WSL SSH is reserved solely as an auxiliary path for emergency maintenance and troubleshooting, exposed only as bounded `host.ssh` probe/exec tasks
      - Automating task deployment or dispatching through the WSL SSH channel is forbidden
    - Native k3s Runtime
      - Any k3s server, control-plane node, agent, or worker running on a computing resource machine must be installed directly on that node's host OS or WSL distro, normally as the native `k3s` systemd service.
      - Running k3s itself inside Docker or any long-lived container is forbidden. Docker remains available for provider-gateway, target-side image builds, user workload containers, and temporary artifact extraction, but not as the k3s runtime boundary.
      - Kubernetes hostPath, local-path storage, node labels, kubelet, CNI, and `/workspace` semantics must resolve against the real computing node filesystem. For WSL nodes such as D601, a Code Queue task working in `/workspace` must see the WSL host `/home/ubuntu`, not a container-private `/home/ubuntu`.
      - If a legacy `rancher/k3s` container is present during migration, it is only an artifact source or rollback reference; it must not remain the active control plane and must be stopped before accepting the node as healthy.
      - A computing resource node that cannot run native k3s and reach the k3s control plane through a stable Kubernetes-supported network must not be listed as an expected k3s node. Current Code Queue HA is provided inside the D601 native k3s control plane through separate read, write, and scheduler Services; D518 remains a normal UniDesk provider/File Browser node until a native k3s-agent network path is designed and verified.
    - k3s Control Bridge Boundary
      - `k3sctl-adapter` is part of the UniDesk control plane, not a workload controlled by k3s. It must remain `deployment.mode=unidesk-direct` or an equivalent UniDesk-managed host service, and must not be converted to `k3sctl-managed`.
      - The adapter exists so UniDesk can inspect, deploy, and repair k3s-managed user services. Putting that bridge inside the k3s cluster would invert the dependency order: repairing or diagnosing k3s would first require the in-cluster adapter and service network to be healthy.
      - On native k3s nodes, the adapter must read the host kubeconfig and connect to the host-local Kubernetes API endpoint, normally `/etc/rancher/k3s/k3s.yaml` and `https://127.0.0.1:6443`. If it is packaged as Docker on a Docker Desktop/WSL node, it must create an explicit host-local bridge such as an SSH local tunnel from the adapter container to the WSL host k3s API; a future systemd service is also valid.
      - k3s-managed business services such as Code Queue and MDTODO still enter through the adapter's Kubernetes API service proxy. The adapter itself is deliberately outside that managed workload graph so CNI, Service, EndpointSlice, or kube-proxy failures do not remove UniDesk's control path.
    - Connection Management
      - When registering, a node carries an authentication token to verify its identity and declares resources such as GPU/CPU
      - The authentication token is pre-issued by the main server and configured at Provider Gateway startup
      - Heartbeats are sent every 15 seconds; if no heartbeat arrives for 90 seconds, the node is marked offline
      - Automatic reconnection on disconnect with exponential backoff to avoid a thundering herd on the main server
  - Data Flow and State Management
    - Task commands are delivered over WebSocket and never contain large file content
    - All state changes are reported to the main server in real time by Provider Gateway
    - The main server writes state updates to PostgreSQL, completing the unified closed loop
    - Pipeline workflow control follows the OA event-flow model: OA is the only control bus, factual node events remain policy-neutral, and runner/monitor/frontend/CLI actions are represented as OA events; detailed constraints live in `docs/reference/pipeline-oa-event-flow.md`
    - Queue-oriented user services must separate read, write, and scheduler responsibilities once they move behind Kubernetes Service routing. Read services may scale and roll independently from PostgreSQL-backed state, write services validate and persist commands, and scheduler services own only task claiming, active run control, and worker/runner lifecycle. A scheduler/runner bridge must not be hidden inside the read path, and a bridge service such as `k3sctl-adapter` must stay outside the k3s-managed workload graph.
  - Single Authoritative Path Discipline
    - Every durable capability must declare one authoritative data, control, event, or network path in reference docs, API contracts, health output, and E2E checks; frontend, CLI, worker, and service code must consume that path directly.
    - Hidden fallback is forbidden. A service must not silently answer from local JSON, transcripts, in-memory state, stale task caches, direct host URLs, old service IDs, alternate proxies, local direct internet, or legacy event streams when the documented path is unavailable.
    - Failure must be explicit and observable through HTTP status, response body, `/health`, diagnostics, frontend degraded state, or test output. If temporary compatibility is unavoidable, it must be documented as a bounded migration or degraded mode with a visible status marker, validation gate, and removal condition.
    - Migration bridges may normalize legacy facts into the authoritative bus or table, but must not become a second source of truth or keep read-time dual-path fallback after the authoritative path is ready.
  - Multi-Repo Deployment Sync
    - The main server repository, D601 deployment tree, provider-local worktrees, and other live copies are working or deployment instances; the Git remote is the long-term project source of truth.
    - Before any development, documentation, or deployment manifest change, an agent must inspect the current worktree with `git status` and pull the latest source from the only accepted integration branch with `git pull --ff-only origin master`.
    - If a pull, rebase, commit, or push is blocked by concurrent work, the conflict must be handled immediately in the current worktree by separating the current task's edits from unrelated parallel changes. Do not create a feature branch to postpone the conflict.
    - Any source, document, or persistent configuration change intended to survive the current task must be committed and pushed to the remote promptly after required self-tests or deployment validation, following `git-spec`.
    - All UniDesk agent changes must be developed on `master` and pushed to `origin master`. Agents must not create, switch to, or push feature/fix branches for UniDesk work.
    - Live deployment should run from a known commit or from a change set that is immediately committed and pushed; local-only hotfixes must not become the implicit dependency for later tasks.
    - Secrets, tokens, generated runtime state, and node-local env files stay outside Git, but their required contract, storage location, and recovery path must be documented so pushing source changes is not blocked by runtime-only data.
  - Critical Task Deployment Principles
    - Single-point components such as the database, core scheduler logic, and API gateway are deployed on the main server
    - The high-availability environment of the main server ensures the critical scheduling path never breaks
    - Compute nodes are only responsible for task execution; their offline status does not affect overall platform availability
  - Large File Storage Solution
    - The concrete implementation is to be determined, and must meet the following requirements
    - Support automated pull and upload by compute nodes without human intervention
    - Provide a programmable interface for the scheduler to generate temporary access credentials
    - Have sufficient bandwidth so that concurrent reads/writes never become the bottleneck for training tasks
  - Deployment Notes
    - Use `docker-compose` on the main server to orchestrate all services uniformly
    - PostgreSQL uses a named volume to guarantee data persistence
    - The Provider Gateway image is built uniformly and distributed to all compute nodes in a versioned manner