docs: bootstrap agentrun references

2026-05-27 15:27:38 +08:00
commit 09c59a4dd6
5 changed files with 347 additions and 0 deletions
@@ -0,0 +1,9 @@
+.state/
+logs/
+node_modules/
+dist/
+build/
+coverage/
+*.log
+*.pid
+*.tmp
@@ -0,0 +1,39 @@
+# AgentRun Agent Index
+
+AgentRun is the shared agent execution infrastructure for UniDesk and HWLAB. This repository owns AgentRun architecture, API contracts, runner/backend behavior and runtime implementation. UniDesk only records how distributed development and deployment are coordinated from the UniDesk control center.
+
+## Critical Source Workspace Rule
+
+- P0: The canonical source workspace is `G14:/root/agentrun`, fixed on branch `master`, with `origin git@github.com:pikasTech/agentrun.git`.
+- P0: Before development, deployment, resumed work or context recovery, run `tran G14:/root/agentrun script -- 'pwd; git status --short --branch; git remote -v'` from UniDesk and confirm the path, branch, remote and clean state.
+- P0: The fixed workspace is only for precheck, fetch, worktree management and final sync. Normal work must use `/root/agentrun/.worktree/{pr_branch}` created from latest `origin/master`.
+- P0: Do not use UniDesk, HWLAB, D601 workspaces, temporary clones, pod-local copies or runner checkouts as AgentRun source truth.
+
+## Critical Runtime Target Rule
+
+- P0: Default deployment targets are G14 native k3s namespaces `agentrun_dev` and `agentrun_prod`.
+- P0: Kubernetes operations must use UniDesk route syntax `G14:k3s`, for example `tran G14:k3s kubectl get pods -n agentrun_dev`.
+- P0: Do not create durable AgentRun exposure through ad hoc NodePort, host port, direct pod IP, one-off port-forward or provider-gateway business HTTP proxy. Public or cross-service access must be introduced through reviewed repository changes.
+
+## Critical MVP Rule
+
+- P0: AgentRun must be developed as a vertical MVP, not as a broad parallel service rewrite.
+- P0: First prove minimal `runner + one backend`; then add `agentrun-mgr` durable facts and manual runner startup; only after that add automatic scheduler.
+- P0: Existing UniDesk Code Queue and HWLAB Code Agent paths are not replaced by default. AgentRun starts as a new shared execution plane with canary integrations.
+
+## Critical RESTful MVP Rule
+
+- P0: The MVP uses short RESTful HTTP/JSON requests only. Do not add SSE, WebSocket, long-polling or long synchronous `turn` calls in the first phase.
+- P0: Long agent work must be represented by asynchronous resources: commands, runs, events, status and leases. Clients observe progress by paged polling.
+- P0: Runner inbound API should stay private and minimal. Clients should call `agentrun-mgr`; manually started runners must still claim and report through the manager.
+
+## Critical CLI Spec Rule
+
+- P0: AgentRun CLI and service work must follow the UniDesk `cli-spec` principles: JSON output by default, no empty-success output, no long blocking CLI calls, visible logs, explicit config validation and RESTful service APIs for stable boundaries.
+- P0: Once a CLI is added, keep the entry small and move implementation into `scripts/src/`; long operations must return quickly and expose status/log/event polling.
+
+## Reference Docs
+
+- `docs/reference/architecture.md`: AgentRun product boundary, service architecture, MVP phases, RESTful API model and data model.
+- `docs/reference/cli.md`: CLI and service API conventions derived from `cli-spec`.
+- `TEST.md`: manual validation scenarios for the documented CLI/service behavior as it is implemented.
@@ -0,0 +1,15 @@
+# AgentRun Manual Test Plan
+
+These tests are placeholders until the CLI and services exist. They define how future manual validation should be written.
+
+## T1 Minimal Run Lifecycle
+
+Read `AGENTS.md`, then use the AgentRun CLI to manually test creating a run, starting a runner for that run, polling events, and observing terminal status. Verify every CLI command returns JSON, includes ids and follow-up commands, and never waits for a full model turn in one request.
+
+## T2 Command And Event Polling
+
+Read `AGENTS.md`, then create a run command and poll command status plus run events. Verify command state is visible, event pagination uses `afterSeq`, and repeated polling does not duplicate events.
+
+## T3 Logs And Failure Visibility
+
+Read `AGENTS.md`, then start the local service or runner with an intentionally invalid backend profile. Verify the CLI returns a structured failure, the log path is printed, and the log file contains enough detail to diagnose the failure.
@@ -0,0 +1,223 @@
+# AgentRun Architecture Reference
+
+AgentRun is a shared agent execution plane for UniDesk and HWLAB. It is not a rename of UniDesk Code Queue and must not replace existing Code Queue behavior by default. Code Queue remains the current UniDesk task queue; AgentRun is a new infrastructure line focused on agent run lifecycle, runner isolation and pluggable execution backends.
+
+## Product Boundary
+
+AgentRun owns generic execution infrastructure:
+
+- creating and tracking runs;
+- accepting durable commands such as `turn`, `steer`, `interrupt` and `resume`;
+- assigning runs to short-lived runners;
+- normalizing backend events, stdout/stderr, assistant messages, tool calls and terminal status;
+- managing leases, heartbeats, retries caused by infrastructure recovery and run visibility;
+- registering backend capabilities and credential injection boundaries.
+
+UniDesk and HWLAB are tenants or clients. UniDesk owns platform entrypoints, provider inventory, CLI/frontend integration and existing Code Queue compatibility. HWLAB owns lab-specific task policy, device and hardware semantics, operation/audit/evidence models and HWLAB workspace rules. AgentRun must not decide whether a HWLAB live device mutation is authorized or whether a UniDesk production deployment may proceed; it executes runs whose tenant policy already grants that authority.
+
+Every run must carry explicit isolation fields:
+
+- `tenantId`, such as `unidesk` or `hwlab`;
+- `projectId`, such as `pikasTech/unidesk` or `pikasTech/HWLAB`;
+- `workspaceRef`, identifying the source/worktree/workspace target;
+- `providerId`, such as `G14` or `D601`;
+- `backendProfile`, such as `codex`, `opencode`, `claudecode`, `host-native` or `windows-native`;
+- `executionPolicy`, including sandbox, approval, timeout, network and secret scope;
+- `traceSink`, identifying where standardized events are mirrored.
+
+## Service Shape
+
+AgentRun should be built as a small service family:
+
+```text
+agentrun-mgr
+  public RESTful API, durable facts, tenant/policy/idempotency checks
+
+agentrun-runner
+  short-lived per-run or per-attempt executor; claims a run, talks to one backend, writes events/status
+
+agentrun-backend-*
+  backend adapters for Codex, Claude Code, OpenCode, host-native or Windows-native execution
+
+agentrun-scheduler
+  later automatic dispatcher; scans pending runs, selects backend/profile/capacity, creates runner Jobs
+```
+
+The manager is the stable API and audit point. The runner is an executor and should not become a public API that business clients call directly. Operators may manually start a runner process or Kubernetes Job during MVP, but that runner still claims a run from `agentrun-mgr` and writes all facts back to the manager.
+
+Backend adapters hide concrete tool protocols. Codex stdio JSON-RPC, OpenCode JSON events, Claude Code, host-native processes and Windows-native execution may use different internal protocols, but public AgentRun APIs must stay stable and backend-neutral.
+
+## MVP Sequence
+
+AgentRun should be built as a vertical slice rather than a broad parallel build.
+
+### M0: Contract Skeleton
+
+Define only the minimal resource model and state machine:
+
+- `Run`
+- `Command`
+- `Event`
+- `Runner`
+- `Backend`
+
+Only `turn`, `interrupt`, `status` and paged `events` are required for the first slice. Do not start with `steer`, `resume`, judge/retry, UI, multi-backend routing or automatic scheduling.
+
+### M1: Minimal Runner Plus One Backend
+
+The first executable proof should run without a manager or scheduler. A runner reads a local run spec, calls one backend and emits standardized events.
+
+Acceptance:
+
+- one `turn` executes through the backend;
+- assistant/output/error events are normalized;
+- terminal status is written;
+- interrupt has at least a durable cancellation path, with real process interruption added when the backend supports it.
+
+The first backend should be the narrowest backend that proves the real agent primitive. If Codex friction is high, a controlled process backend can be used briefly for contract shape, but the MVP must move to one real agent backend before higher layers are considered validated.
+
+### M2: Manager Plus Runner Claim
+
+Add `agentrun-mgr` as the durable fact store and public API. A client creates a run; an operator or CLI manually starts a runner with the run id; the runner claims, polls commands, appends events, heartbeats and exits.
+
+Acceptance:
+
+- run creation and query are durable;
+- runner claim is idempotent and rejects double ownership;
+- events are append-only and paged by sequence;
+- command ack states are visible;
+- heartbeat expiration is observable.
+
+### M3: Manual Dispatch CLI
+
+Add a CLI that starts a local runner process or Kubernetes Job for a selected run. This is manual dispatch, not manager-side synchronous orchestration. The manager still owns the facts; the runner still owns execution.
+
+Acceptance:
+
+- CLI returns quickly with JSON output;
+- job/process identity and log path are visible;
+- run status can be polled from the manager;
+- failed runner startup is reported as infrastructure failure, not as silent task success.
+
+### M4: Automatic Scheduler
+
+Only after M1-M3 are stable, add `agentrun-scheduler`. The scheduler scans pending runs, applies policy/capacity/backend selection, creates runner Jobs and performs stale lease recovery. It must not directly execute backends.
+
+Acceptance:
+
+- pending runs become running automatically;
+- scheduler restart does not interrupt already running runners;
+- stale leases are recovered with explicit audit events;
+- scheduler rollout does not imply active run failure.
+
+### M5: Tenant Canary Integrations
+
+UniDesk and HWLAB should integrate as canaries after core lifecycle proof:
+
+- UniDesk may add `agentrun` CLI/API routes while existing Code Queue remains intact.
+- HWLAB may route a narrow Code Agent canary through AgentRun.
+- Tenant policy, workspace, secret scope and trace sinks must be explicit in every run.
+
+## RESTful MVP Contract
+
+The MVP uses only short RESTful HTTP/JSON requests. Long-running agent work is represented by durable command resources, run status and paged event polling. Do not keep an HTTP request open for a full model turn.
+
+Public manager APIs:
+
+```http
+POST /api/v1/runs
+GET  /api/v1/runs/:runId
+GET  /api/v1/runs/:runId/events?afterSeq=0&limit=100
+POST /api/v1/runs/:runId/commands
+GET  /api/v1/runs/:runId/commands/:commandId
+GET  /api/v1/backends
+```
+
+Runner-to-manager APIs:
+
+```http
+POST  /api/v1/runners/register
+POST  /api/v1/runs/:runId/claim
+PATCH /api/v1/runs/:runId/lease
+GET   /api/v1/runs/:runId/commands?afterSeq=0&limit=20
+POST  /api/v1/runs/:runId/events
+PATCH /api/v1/runs/:runId/status
+POST  /api/v1/commands/:commandId/ack
+```
+
+Runner inbound APIs should stay local/private and minimal:
+
+```http
+GET /health
+GET /debug/status
+```
+
+Do not depend on clients calling transient runner Pod addresses. That breaks across Jobs, namespaces, host-native backends and restarts.
+
+## Command State
+
+Commands are durable resources. `turn`, `steer`, `interrupt` and `resume` must not be implemented as synchronous process calls from a client to a runner.
+
+The initial command state machine is:
+
+```text
+accepted -> delivered -> confirmed
+accepted -> delivered -> failed
+accepted -> expired
+```
+
+All command writes should support idempotency keys. Duplicate commands with the same idempotency key and payload hash should return the existing command. Duplicate keys with different payload hashes should fail visibly.
+
+## Event Model
+
+Events are append-only and paged by sequence:
+
+- `seq` is monotonic per run.
+- `eventId` or `(runId, seq)` is idempotent.
+- `GET /events?afterSeq=N&limit=M` is the first observation API.
+- Later SSE may stream the same events, but must not replace the REST polling contract.
+
+Minimum event categories:
+
+- `system`
+- `assistant_message`
+- `tool_call`
+- `command_output`
+- `diff`
+- `error`
+- `backend_status`
+- `terminal_status`
+
+## Data Model Direction
+
+The first implementation may use a compact schema, but it should not hide all facts in one JSON blob. The stable direction is:
+
+- `agentrun_runs`: run identity, tenant/project/workspace/backend policy, status and timestamps;
+- `agentrun_commands`: command type, idempotency key, payload hash, state and ack timestamps;
+- `agentrun_events`: append-only event records keyed by run and sequence;
+- `agentrun_runners`: registered runner identity, placement and heartbeat;
+- `agentrun_backends`: backend profile, capabilities, capacity and health;
+- `agentrun_leases`: current ownership and expiry.
+
+## Deployment Direction
+
+Default runtime targets are G14 native k3s namespaces `agentrun_dev` and `agentrun_prod`. Control-plane services should be long-lived; runners should be short-lived Jobs or controlled host-native processes. Backend adapters may run as pods or host-native services, but they must register capabilities and health through AgentRun rather than being called through ad hoc addresses.
+
+Namespace isolation, RBAC, Secret scope, NetworkPolicy and ResourceQuota should be introduced before broad tenant use. A separate cluster is a later maturity option; the first implementation should prove the service inside G14 k3s namespaces unless a concrete isolation blocker requires a dedicated cluster.
+
+## Non-Goals For MVP
+
+Do not include these in the first MVP:
+
+- migrating UniDesk Code Queue;
+- replacing HWLAB Code Agent globally;
+- multi-backend routing;
+- UI beyond minimal diagnostics;
+- judge/retry automation;
+- automatic scaling;
+- cross-cluster placement;
+- SSE/WebSocket streaming;
+- complete permission system;
+- production rollout automation.
+
+The first goal is one stable vertical run lifecycle: create run, manually start runner, execute one backend turn, append events, observe final status and issue a visible interrupt/cancel command.
@@ -0,0 +1,61 @@
+# AgentRun CLI And Service API Reference
+
+AgentRun CLI and service APIs follow the UniDesk `cli-spec` principles. This document records the expected shape before the CLI exists so implementation does not drift toward long blocking commands or hidden state.
+
+## CLI Shape
+
+The future CLI entry should be small and route to implementation modules:
+
+```text
+scripts/agentrun-cli.ts
+scripts/src/*.ts
+```
+
+The CLI should default to JSON output. Empty stdout is a failure, not success. Every command must return enough structured data to continue debugging, including ids, status, log paths or follow-up commands.
+
+Long operations must be fire-and-forget or short asynchronous resource operations. CLI calls should return in under 60 seconds. A command that creates a run or starts a runner returns the created resource and polling commands; it does not wait for the model turn to complete.
+
+## Planned Commands
+
+Initial command families:
+
+```bash
+bun scripts/agentrun-cli.ts runs create --json-file <run.json>
+bun scripts/agentrun-cli.ts runs show <runId>
+bun scripts/agentrun-cli.ts runs events <runId> --after-seq <n> --limit <n>
+bun scripts/agentrun-cli.ts commands create <runId> --type turn --json-file <payload.json>
+bun scripts/agentrun-cli.ts commands show <commandId>
+bun scripts/agentrun-cli.ts runner start --run-id <runId> --backend <backendProfile>
+bun scripts/agentrun-cli.ts backends list
+bun scripts/agentrun-cli.ts server start|status|stop|logs
+```
+
+The exact command names may change when implemented, but the behavior must keep these rules:
+
+- `runs create` creates durable facts and returns immediately.
+- `runner start` starts a local process or Kubernetes Job and returns process/job identity, log path and poll commands.
+- `events` is paged and bounded by default.
+- `server logs` returns bounded logs and points to full log files.
+- `status` must expose port, process id, health and log paths once a local service exists.
+
+## Config And Logs
+
+AgentRun config must be explicit and validated. Do not silently fall back to deployment-critical defaults. When a local development server is introduced, it should use a fixed port and write real-time logs under:
+
+```text
+logs/{YYYYMMDD}/
+```
+
+Runtime state, pid files and transient job metadata belong under:
+
+```text
+.state/
+```
+
+Secrets must not be committed. Credential sources should be represented by references, not values, in CLI output and logs.
+
+## RESTful Service Rule
+
+Stable cross-service boundaries use HTTP JSON resource APIs. The MVP does not use SSE, WebSocket or long synchronous requests. Long-running work is represented by runs, commands, leases, status and paged events.
+
+The CLI should call the same REST API paths that production clients use. Debug commands may expose smaller slices of the flow, but they must share implementation with the real path rather than maintaining a parallel mock-only path.