Files

T

Codex 7ef98fbe08 docs: clarify v02 final response closeout

2026-06-04 10:31:06 +00:00

38 KiB

Raw Blame History

G14 Provider Node

G14 is the current HWLAB DEV/PROD source and k3s/GitOps runtime truth, and it remains a UniDesk provider node for staging other infrastructure workloads. Its UniDesk provider id is G14; the local UniDesk worktree is /root/unidesk, and the native k3s kubeconfig is /etc/rancher/k3s/k3s.yaml.

G14's long-lived k3s control bridge is k3sctl-adapter-g14, a UniDesk direct service outside the k3s fault domain. It listens on the G14 host loopback port 127.0.0.1:4266 and is registered separately from the D601 k3sctl-adapter, so G14 infrastructure services can be built and tested without taking over user services that still run on D601.

For Code Queue and non-HWLAB CI/CD migration preparation, G14 uses native k3s labels unidesk.ai/node-id=G14 and unidesk.ai/provider-id=G14. The G14 Code Queue manifests src/components/microservices/k3sctl-adapter/k3s/code-queue.g14.k8s.yaml and src/components/microservices/k3sctl-adapter/k3s/code-queue.g14.k3s.json are candidate staging artifacts only until an explicit production cutover is approved. Non-HWLAB production Code Queue, CI/CD and user-service execution must remain on D601 while D601 is carrying those services.

HWLAB DEV/PROD Runtime

G14 hosts the current HWLAB DEV runtime in native k3s namespace hwlab-dev, and the same node/GitOps line is the HWLAB PROD target unless a newer HWLAB repo rule says otherwise. The canonical G14 HWLAB source workspace is /root/hwlab on branch G14, with origin set to git@github.com:pikasTech/HWLAB.git; before any G14 HWLAB work, this fixed workspace must be immediately fast-forwarded to the latest origin/G14. HWLAB source edits, CI/CD/GitOps script changes, render work, manual polling and runtime validation must be based on that updated workspace through UniDesk SSH passthrough. Do not use /root/HWLAB, /home/ubuntu/hwlab, /workspace/hwlab, D601 workspace or a master-server checkout as persistent HWLAB source truth. G14-local details are mirrored on the node in /root/docs/hwlab-g14-workspace.md.

The standard entry forms are:

trans G14:/root/hwlab script -- 'git fetch origin G14 && git pull --ff-only origin G14 && git status --short --branch && git remote -v'
trans G14 apply-patch < patch.diff
trans G14:k3s kubectl get pods -n hwlab-dev

G14:k3s is the only supported k3s route form. Do not use ssh G14 k3s ...; the first token must locate the distributed target, and the following tokens must be the operation.

If /root/hwlab has unrelated local changes when this sync starts, first determine whether they can be quickly merged with the latest origin/G14. Merge them immediately when they are mergeable; do not default to stash, discard or a behind worktree. Only when the changes cannot be automatically merged should they be isolated and the operation stopped for human decision. A behind fixed workspace is not a valid basis for precheck, new worktree creation, render, polling, deployment or runtime validation.

The G14 HWLAB runtime boundary is:

Current DEV public endpoints are http://74.48.78.17:17666/ and http://74.48.78.17:17667/health/live. D601 16666/16667 is legacy/migration evidence only.
Keep HWLAB Services as ClusterIP unless a repo-owned G14 GitOps rule explicitly exposes them. Public exposure should stay in the approved G14 edge/proxy path, not ad hoc NodePort or local port-forward.
Use a G14-local PostgreSQL instance such as hwlab-g14-postgres and a G14-local hwlab-cloud-api-dev-db Secret for cloud-api durable runtime tests. Do not copy D601 database credentials.
Use only G14-local Codex auth material and k8s Secrets authorized for HWLAB on G14; do not copy D601 or production auth material by hand.
Set HWLAB_CLOUD_API_PORT=6667 explicitly in the G14 cloud-api Deployment. Kubernetes otherwise injects a HWLAB_CLOUD_API_PORT=tcp://... Service environment variable that breaks the Node port parser.
HWLAB_PUBLIC_ENDPOINT and health/live evidence must describe the G14 endpoint, not the old D601 production endpoint.
Do not run HWLAB repository check, Playwright/browser smoke, image builds or other heavy validation on the master server. Run those through G14 /root/hwlab, G14 k3s/Tekton, or another explicitly approved external execution plane.
Manual device-agent experiments for real hardware must be standalone resources in hwlab-dev such as device-agent-71-freq and must not patch existing HWLAB Deployments, Services, ArgoCD Applications, FRP, CD desired-state or public frontend routing unless a separate HWLAB change authorizes it.
A D601 Windows hwlab-gateway may connect outbound to G14 DEV cloud-api as an external host bridge for Keil/serial/workspace access. That bridge does not make D601 the HWLAB runtime truth; it is only a hardware access provider behind the G14 device-agent/cloud-api path.

After the G14-local database is provisioned, run the HWLAB migration CLI only against the G14 DEV database with explicit non-production confirmations:

kubectl -n hwlab-dev exec deploy/hwlab-cloud-api -- \
  node /app/cmd/hwlab-cloud-api/migrate.mjs \
  --apply --confirm-dev --confirmed-non-production

Healthy G14 HWLAB runtime means the main Deployments and StatefulSets are Ready, cloud-api and edge-proxy return /health/live with status=ok, durable runtime checks pass, and the public G14 DEV endpoints report the expected revision. For a device-agent smoke, health also requires the standalone device-agent Service to answer in-cluster and the D601 Windows gateway session/resource/capability to be visible through G14 cloud-api.

HWLAB v0.2 Expansion Line

HWLAB v0.2 is an additive G14 expansion line. It must not rename, delete, repurpose, or mutate the existing G14 source branch, G14-gitops branch, hwlab-dev namespace, hwlab-prod namespace, DEV public ports 17666/17667, or PROD public ports 18666/18667. Existing DEV/PROD CI/CD remains the stability baseline while v0.2 is introduced beside it.

The fixed v0.2 source branch is v0.2, forked from the current G14 branch after the G14 long-term reference docs record this decision. The fixed G14 development workspace for that branch is:

trans G14:/root/hwlab-v02 script -- 'git status --short --branch && git remote -v'

/root/hwlab-v02 is the long-lived v0.2 development workspace, not a scratch clone or CI/CD source selector. It must track origin/v0.2 with origin git@github.com:pikasTech/HWLAB.git; local dirty state, stale HEAD, and untracked .worktree/ only affect human development. Existing G14 work continues to use /root/hwlab; do not reuse /root/hwlab or /root/hwlab/.worktree/* as the v0.2 fixed workspace.

v0.2 CI/CD source selection is isolated in the dedicated bare repo G14:/root/hwlab-v02-cicd.git. UniDesk control-plane commands must fetch origin/v0.2 into that repo and render from a commit-pinned detached worktree; they must not read the source commit from /root/hwlab-v02 checkout state.

The fixed v0.2 runtime namespace is hwlab-v02. The intended public FRP allocation is:

Cloud Web browser entry: http://74.48.78.17:19666/.
API/edge entry and live health: http://74.48.78.17:19667/health/live.

Master-side FRP server maintenance for HWLAB public ports is documented in docs/reference/hwlab.md#hwlab-frp-维护; keep the detailed allowlist, restart boundary and verification sequence there instead of duplicating another runbook in this G14 node reference.

The v0.2 CI/CD integration must be additive: add a manual UniDesk trigger, dedicated CI/CD source repo, devops-infra git mirror/relay, GitOps desired-state lane, Argo CD Application, namespace resources, artifact catalog, and deploy.json environment only when they target v0.2/hwlab-v02 explicitly. Do not add a v0.2 branch poller or retarget the existing G14 poller, DEV/PROD Argo Applications, DEV/PROD runtime paths, or existing namespace resources to bootstrap v0.2.

The devops-infra git mirror/relay remains manual and CLI-controlled, not CronJob-driven. The standard v0.2 CI/CD trigger is bun scripts/cli.ts hwlab g14 control-plane trigger-current --lane v02 --confirm; this command must fetch /root/hwlab-v02-cicd.git, resolve the current origin/v0.2 source commit, check the mirror's localV02 ref before creating the PipelineRun, run one bounded manual git-mirror sync Job when the mirror is stale, and only continue after the mirror ref matches the current source commit. Use hwlab g14 git-mirror sync --confirm directly only for explicit mirror maintenance or diagnosis.

After a v0.2 PipelineRun completes, treat runtime rollout and remote GitOps persistence as two separate checks. hwlab g14 control-plane status --lane v02 is the runtime check: it must show the expected source commit, PipelineRun completed, Argo Synced/Healthy, public 19666/19667 probes passing, and Cloud Web asset probes such as /app.js readable. hwlab g14 git-mirror status is the persistence check: cache.summary.pendingFlush must be false and cache.summary.githubInSync true before declaring GitOps fully flushed back to GitHub. If runtime is healthy but pendingFlush=true, run bun scripts/cli.ts hwlab g14 git-mirror flush --confirm and poll the returned job with bun scripts/cli.ts job status <jobId> --tail-bytes 12000; do not replace this with raw kubectl, native git push, or a long SSH wait.

When closing an issue against a specific completed v0.2 PipelineRun, use targeted status instead of the latest-head status if origin/v0.2 has already advanced through a parallel task:

bun scripts/cli.ts hwlab g14 control-plane status --lane v02 --pipeline-run hwlab-v02-ci-poll-<short-sha>
bun scripts/cli.ts hwlab g14 control-plane status --lane v02 --source-commit <full-sha>

Targeted status must expose statusTarget.mode and targetValidation. targetValidation.state=passed means the requested PipelineRun/source commit reached a succeeded PipelineRun, Argo Synced/Healthy, public web/API probes, flushed Git mirror, and matching runtime source commits for the services listed in that run's planArtifacts.rolloutServices; services listed in planArtifacts.reusedServices remain visible as runtime/provenance evidence but must not be forced to the target source commit. targetValidation.state=superseded means the requested PipelineRun succeeded and was later replaced in runtime by a newer succeeded v0.2 PipelineRun; this is valid closure evidence for the requested run when the newer commit is on the same branch lineage. In both states, commitAlignment.staleReasons may still mention later origin/v0.2 or CI/CD source head movement; that is parallel-head context, not a failure of the requested run. falseGreenGuard is a current-runtime guard and should report not-applicable/superseded for such historical targets instead of turning later runtime movement into a false failure. Default status without a target remains strict for the latest source head.

For HWLAB user-feedback, CLI, Cloud Web, AgentRun, device-pod, public API, or runtime workflow issues, source-level validation is not enough to close the issue. Unit tests, contract tests, git diff --check, targeted build checks, PR merge metadata, and source commit rollout evidence are supporting evidence only. The issue may be closed only after the affected user entry or original entry has been exercised against the target runtime. For CLI issues, that means running the relevant hwlab-cli or UniDesk-controlled CLI command from the G14 v0.2 workspace or approved execution plane against the intended lane/URL/namespace and proving the observed behavior, not just proving the helper code compiles. For Cloud Web or public API issues, use the public endpoint or a bounded API/asset smoke that reaches the deployed runtime. For AgentRun or device-pod issues, capture the trace/session/thread/run/job/device evidence that proves the specific continuation or hardware workflow reached the live backend.

For Cloud Web Workbench and Code Agent issues, the closeout validation must use the same dispatch entry as the browser flow, or a CLI command that calls that same Cloud Web/Cloud API dispatcher path. A hand-written dispatchHwlabAgentRun() canary, direct AgentRun manager command, or runner job created outside the Web dispatcher is only infrastructure evidence; it cannot prove that the browser path requested the correct toolCredentials, toolAliases, transient env, conversation/session/thread binding, or runtime lane. If no CLI can exercise the Web-equivalent path, improve the CLI first and keep the issue open until the Web-equivalent CLI or browser trace proves the deployed behavior.

For Cloud Web Workbench Code Agent response or trace-rendering bugs, the minimum Web-equivalent CLI proof is a fresh hwlab-cli client agent send --wait against the deployed public Web origin, followed by hwlab-cli client agent trace <traceId> --render web against the same origin. The submit proof must show the browser dispatcher family, normally POST /v1/agent/chat, result polling through /v1/agent/chat/result/<traceId>, continuation.webEquivalent=true, shortConnection=true, and explicit sessionId / conversationId / threadId binding when those values affect the bug. The result proof must show the final assistant text from assistantText or reply.content; placeholder status text, result summaries, terminal status messages, and AgentRun completion boilerplate are not acceptable substitutes for the assistant final response.

For persisted final-response display regressions, a fresh turn alone is not enough when the user report identifies an existing conversation, session, or trace. Re-read the original record on the deployed v0.2 runtime with locked lane env and the correct projectId; the default session list project may differ from the affected Workbench project. The minimum proof is client session list --project-id <projectId> --limit <N> --full, client session inspect <conversationId> --full, and client agent result <traceId> --full. Passing evidence must show that list and inspect surface the same latest agent traceId as lastTraceId, the latest agent text matches the terminal result reply.content or equivalent final assistant text, and known fallback text such as Code Agent 仍在处理，可以继续 steer 或等待 trace 完成。 is absent from list, inspect, and result output. When the repair is lazy-on-read, run the read path again or capture the exposed repair source/updated marker so the evidence proves persisted conversation state was repaired, not merely synthesized for one response. client agent trace <traceId> --render web remains required for trace-rendering bugs; for persisted conversation-display bugs it is supporting evidence unless it returns rendered assistant rows from the same original trace.

The --render web proof must inspect the rendered body, not only the raw event count. Passing evidence should include body.render=web, the shared renderer identity when exposed, status=completed, rendered/returned row counts, noise/omitted counts when available, at least one rendered assistant row containing the final assistant text, and an explicit absence check for known non-user boilerplate such as AgentRun terminal status completed, AgentRun result is ready, and Code Agent 仍在处理. If the trace API returns status=missing, sourceEventCount=0, or no rows for a historical issue trace, treat that trace as expired or unavailable; do not use it as closure evidence. Generate a fresh equivalent turn on the current v0.2 runtime and validate that trace instead.

CLI/Web-equivalent trace evidence does not replace browser UI evidence for visual, layout, copy-to-clipboard, collapsed-panel or removed-control bugs. Those require a bounded browser or DOM smoke against http://74.48.78.17:19666/ after rollout, with assertions on the deployed page text, DOM state, or control behavior that the user reported. A local bundle smoke can support regression coverage, but the closeout still needs the deployed public endpoint unless the browser entry is unavailable and the issue comment records the blocker. Missing Playwright browser binaries or declared test dependencies are not a valid skip; install the repository-declared runner/browser or use an approved system browser executable and record that choice in the validation evidence.

The closing comment for these issues must be semantic natural language before it lists evidence: state what the user-visible problem was, what changed, where it rolled out, and what original entry was rechecked. It must include the actual command or entry path, target lane or endpoint, relevant trace/session/thread/PipelineRun/run/device ids, and the pass/fail result. If the original entry cannot be verified because rollout has not happened, credentials are unavailable, the target runtime is down, or the required CLI capability is missing, keep the issue open and record the blocker. Do not close the issue on the strength of PR merge, targeted tests, or "will be verified after rollout" wording. If an issue was closed before this real CLI/user-entry validation, reopen it and add a correction comment before continuing.

For HWLAB v0.2 Code Agent context-loss or multi-turn continuity issues, the minimum closeout is a real hwlab-cli client agent two-turn E2E from G14:/root/hwlab-v02 or another approved G14 execution plane with locked runtime namespace/lane env. Submit the first turn, poll its result to completed, submit the second turn with the same explicit conversationId/sessionId/threadId, then capture trace/inspect evidence. Passing evidence must show the second turn used prior-turn context, and should include context attachment or run reuse labels such as conversation-context:attached, agentrun:run:reused, agentrun:runner-job:reused, plus the relevant run/command ids. Long verification evidence belongs in a separate gh issue comment create --body-file comment; lifecycle close comments stay short, as defined in docs/reference/cli.md.

/health/live revision is owned by hwlab-cloud-api; it can legitimately differ from the source commit for a Cloud Web-only change. Do not call that difference a failed Cloud Web rollout when webAssets.checks.htmlOk, webAssets.checks.appJsOk, CSS probes, Argo health, and hwlab-cloud-web Deployment readiness have passed. For Cloud Web behavior changes, the public JS asset probe or a bounded browser/DOM check is stronger evidence than cloud-api apiRevision.

Do not turn v0.2 expansion governance into a stack of broad compatibility gates. The stable control points are branch, dedicated CI/CD source repo, git mirror/relay refs, GitOps branch, namespace, runtime path, Argo Application, FRP ports and generated-output ownership. Legacy DEV/D601/main preflights that block the v0.2 lane should be removed from that lane, not patched with fallback or legacy modes. Naming, RBAC scope, cleanup policy, resource quota and rollback order are design decisions or runbook entries unless they protect a concrete high-value risk that cannot be enforced by the fixed boundaries above.

v0.2 Worktree + PR Workflow

v0.2 source-of-truth changes must enter through a task-scoped worktree on a feature branch and then merge through a PR, not by direct commits to v0.2. The generic P2/P3/P4 flow is owned by $dad-dev; this section only fixes the G14/v0.2-specific source route, branch and lane:

trans G14:/root/hwlab-v02 script -- 'git fetch origin v0.2 && git pull --ff-only origin v0.2 && git status --short --branch'
trans G14:/root/hwlab-v02 script -- 'git worktree add .worktree/<task> -b fix/issue<N>-<short-name> origin/v0.2'

The fixed repo at /root/hwlab-v02 is not a scratch area and must not carry parallel worktree state on the v0.2 branch itself. All worktree branches should follow the fix/issue<N>-<short-name> naming so PR titles and merge commits stay scannable. GitHub PR writes, merge, rollout trigger and final original-entry validation follow $dad-dev plus the UniDesk CLI control rules in AGENTS.md.

Recovery From a Direct Commit To v0.2

A direct commit on v0.2 (work that landed with git commit on the fixed repo or a checkout that bypassed the worktree) must be moved onto a feature branch and re-merged through a PR before the next trigger-current reads it. The recovery is bounded and audit-friendly, but it is also a git push --force-with-lease against the protected branch, so it is only acceptable when the direct commit is the only new content on v0.2 since the last merged PR:

Confirm no parallel worktree was in flight and the commit is the only delta. trans G14:/root/hwlab-v02 script -- 'git log origin/v0.2..HEAD' and git log HEAD..origin/v0.2 must show the direct commit as a single fast-forward candidate.

Capture the commit identity and patch for the recovery record:

trans G14:/root/hwlab-v02 script -- 'git show <direct-commit-sha> > /tmp/v0.2-recovery.patch'

Roll the fixed repo back to the previous merged PR head. Use git reset --hard <previous-pr-sha>; this preserves any autostash (e.g. from a parallel git checkout snapshot in another worktree) on the stash list and does not touch the other worktree's working tree.
In the pre-existing worktree (e.g. .worktree/<task> on fix/issue<N>-<short-name>) bring the branch up to the previous PR head with trans G14:/root/hwlab-v02/.worktree/<task> script -- 'git reset --hard <previous-pr-sha>', then git cherry-pick <direct-commit-sha> to replay the direct commit on the feature branch. If the worktree branch was already a clean clone of origin/v0.2 at the previous PR head, the reset is a no-op.

Push the feature branch and force-push v0.2 back to the rolled-back head with --force-with-lease (refuses to clobber a concurrent push):

trans G14:/root/hwlab-v02/.worktree/<task> script -- 'git push -u origin fix/issue<N>-<short-name>'
trans G14:/root/hwlab-v02 script -- 'git push --force-with-lease origin v0.2'

Open the PR through UniDesk CLI, squash-merge, then git pull --ff-only origin v0.2 to bring the fixed repo back in sync. The previous PR's merge commit will not be in the new PR's history; the new PR's diff equals the original direct commit's diff, so the PR trail still contains the exact same bytes.
bun scripts/cli.ts hwlab g14 control-plane status --lane v02 will read the new merge commit; the previously-staged PipelineRun for the direct commit was created on the v0.2 head and trigger-current will delete + recreate it for the post-merge head, so no manual PipelineRun cleanup is required.

The recovery is auditable: the original git show patch and the cherry-pick SHA both land in the PR diff, so the issue/PR trail still contains the exact same bytes that were first committed directly. This is a one-time corrective action; recurring direct commits on v0.2 are a workflow regression and must be called out in the relevant issue or PR.

v0.2 Cloud Web Runtime Layout Validation

Cloud Web layout, status-panel, collapsed-control, and modal issues on v0.2 need deployed browser evidence. Source checks and control-plane rollout are supporting evidence; they do not prove that the public 19666 page renders the fixed DOM.

Use these surfaces together:

trans G14:/root/hwlab-v02/.worktree/<task>/web/hwlab-cloud-web script -- 'bun run check' for static unit/contract/layout checks and dist freshness.
bun scripts/cli.ts hwlab g14 control-plane status --lane v02 for runtime, Argo, public endpoint, and GitOps alignment. If origin/v0.2 moved through a parallel PR, use --pipeline-run or --source-commit and treat same-branch supersession as context rather than failure.
Public API probes for both /health/live and /v1/live-builds. /health/live proves live service health/revision, but Cloud Web build time, image tag/digest, source metadata, and actual runtime commit/revision should be read from /v1/live-builds.
A bounded browser/DOM probe against http://74.48.78.17:19666/ that asserts the deployed page state relevant to the issue.

For Workbench status/build panels, the minimum DOM proof should check the topbar chip, absence of full status cards in the right sidebar, hidden collapsed lists actually absent from layout, bounded scroll ownership on the right content area, and a details dialog that contains environment image metadata, actual live commit/revision, and source/build-time fields when available.

/v1/live-builds.latest is global across services and can legitimately point at hwlab-cloud-api when API rolled after Web. Inspect the hwlab-cloud-web service row before deciding whether a Web build field is missing or stale.

Generic layout smoke can be used only when it is bounded in the current transport. A Playwright smoke that runs through trans with no output for the SSH idle timeout, leaves preview/browser processes behind, or never writes an exit/report file is not closure evidence. Run it as an async remote job with explicit report and cleanup, or use a smaller issue-specific DOM probe that emits one JSON result and exits.

v0.2 Cloud Web Button/JS Sync Rule (HWLAB #748)

When a v0.2 Cloud Web fix removes a button from index.html or a field from the el literal in web/hwlab-cloud-web/app.ts, every el.<removed-field>.addEventListener(...) (or .requestSubmit() / .showModal() / etc.) binding must be removed from the matching init* function in the same commit. The static web:check does not catch this orphan listener class because the TypeScript build is Bun.build transpile-only (no tsc --noEmit), and the runtime crash only surfaces as Cannot read properties of undefined (reading 'addEventListener') on first init. The minimal closeout checks for the v0.2 lane are:

# 1. Web assets rebuild and the orphan is gone from the dist
trans G14:/root/hwlab-v02/.worktree/<task> script -- 'cd web/hwlab-cloud-web && bun run build'
trans G14:/root/hwlab-v02/.worktree/<task> script -- "grep -c '<removed-field>' web/hwlab-cloud-web/dist/app.js"   # must be 0
trans G14:/root/hwlab-v02/.worktree/<task> script -- "grep -c 'id=\"<removed-id>\"' web/hwlab-cloud-web/index.html" # must be 0

# 2. Live 19666/19667 confirms the deployed bundle is the new build
curl -fsS http://74.48.78.17:19666/ | grep -c '<removed-id>'                                          # must be 0
curl -fsS http://74.48.78.17:19666/app.js | grep -c '<removed-field>'                                 # must be 0
bun scripts/cli.ts hwlab g14 control-plane status --lane v02                                          # webAssets.checks.appJsOk = true, sourceCommit = merge commit

While the PR is open, the author can also run a one-liner to surface any orphan el.<field>.addEventListener whose field is not declared in the el literal of app.ts:

trans G14:/root/hwlab-v02/.worktree/<task> script -- 'awk "/^const el = /,/^};/" web/hwlab-cloud-web/app.ts | tr -d "," | awk "{print \$1}" | grep -E "^[a-zA-Z]" | sort -u > /tmp/el-fields.txt; grep -nEo "el\\.([A-Za-z_$][A-Za-z0-9_$]*)\\.addEventListener" web/hwlab-cloud-web/*.ts | while read m; do field=$(echo "$m" | sed -E "s/.*el\\.([A-Za-z_$][A-Za-z0-9_$]*)\\.addEventListener.*/\\1/"); if ! grep -q "^$field$" /tmp/el-fields.txt; then echo "ORPHAN: el.$field.addEventListener"; fi; done'

Document the explicit grep / curl evidence in the issue closeout comment. Tightening the el literal with proper TypeScript types is tracked separately and must not be done as part of a runtime fix PR.

Node-Local VPN Proxy

G14 has a node-local VPN/proxy stack for infrastructure bootstrap and recovery downloads:

Primary mixed HTTP/SOCKS proxy: 127.0.0.1:10808.
Backup Hysteria2 HTTP proxy: 127.0.0.1:11809.
Backup Hysteria2 SOCKS5 proxy: 127.0.0.1:11808.
Operator-only local details remain on G14 under /root/docs/vpn-proxy-ops.md; subscription URLs, node credentials and GUI database contents must not be copied into the UniDesk repository.

The G14 host persists this proxy configuration in these local files:

/etc/profile.d/unidesk-g14-proxy.sh exports HTTP_PROXY, HTTPS_PROXY, ALL_PROXY, lowercase aliases and NO_PROXY for new login shells. Set UNIDESK_G14_DISABLE_PROXY=1 before shell startup to opt out.
/root/.npmrc pins npm proxy, https-proxy, noproxy and retry settings for root-side bootstrap commands.
/root/.gitconfig pins root Git HTTP/HTTPS proxy settings.
/root/.docker/config.json pins Docker client proxy settings for commands and build contexts that honor Docker client proxy configuration.
/etc/systemd/system/docker.service.d/proxy.conf pins Docker daemon pull proxy settings. Updating this drop-in requires systemctl daemon-reload and a Docker restart before the active daemon sees the new NO_PROXY; do not restart Docker while G14 provider-gateway, k3s bootstrap or image builds are in flight unless that interruption is intentional.

The NO_PROXY list must include localhost, the main server, private LAN ranges, k3s pod/service CIDRs, Kubernetes service domains and the loopback registry so that k3s, 127.0.0.1:5000, Kubernetes API access and UniDesk control paths do not route through the VPN proxy.

The primary proxy can be used for G14 target-side image bootstrap when Docker Hub, npm, GitHub or Playwright downloads are unreliable through direct network or provider-gateway WS egress. For Docker build steps that use 127.0.0.1, build with host networking so the build container reaches the host proxy:

docker build --network host \
  --build-arg HTTP_PROXY=http://127.0.0.1:10808 \
  --build-arg HTTPS_PROXY=http://127.0.0.1:10808 \
  --build-arg ALL_PROXY=socks5h://127.0.0.1:10808 \
  --build-arg http_proxy=http://127.0.0.1:10808 \
  --build-arg https_proxy=http://127.0.0.1:10808 \
  --build-arg all_proxy=socks5h://127.0.0.1:10808 \
  ...

127.0.0.1:10808 is a G14 host loopback endpoint. Inside an ordinary k3s Pod, 127.0.0.1 is the Pod network namespace, not the node proxy. Do not set long-lived workload proxy env to http://127.0.0.1:10808 unless that workload is intentionally hostNetwork and the port conflict/DEV-PROD blast radius has been reviewed. Temporary hostNetwork debug Pods may use the node-local proxy only for bounded bootstrap proof or cache prewarm; they must not become GitOps desired state just to make external downloads work.

The backup proxy uses HTTP_PROXY=http://127.0.0.1:11809, HTTPS_PROXY=http://127.0.0.1:11809 and ALL_PROXY=socks5h://127.0.0.1:11808.

This proxy is not a replacement for UniDesk runtime egress. k3s workloads such as Code Queue must still use the cataloged g14-provider-egress-proxy Kubernetes Service and g14-tcp-egress-gateway for normal runtime access to PostgreSQL, OA Event Flow and external APIs. The node-local VPN proxy is allowed only for G14 host-side bootstrap, image build, cache prewarm or recovery steps, and those steps should record the proxy choice in issue or deployment evidence.

v0.2 device-pod cloud-api architecture

v0.2 device-pod integration is the cloud-api → executor → D601 Windows device-host-cli.mjs chain under internal/cloud/access-control.ts, cmd/hwlab-device-pod/main.ts and the host-side F:\Work\ConStart\tools\device-host-cli.mjs. PR #765 (selector cheat sheet + fail-fast) and PR #778 (output.text propagation + evidence selector + read-only sub-action --reason exemption) are the two anchor PRs; PR #779 tracks the still-open host-side ops work. Earlier work used raw MUTATING_INTENTS.has(intent) && !reason and a single-pass textOr(output.text, …) extractor; both are obsolete and must not be re-introduced.

Intent / sub-action / reason matrix

DEVICE_JOB_INTENTS (cloud-api) enumerates the full supported surface; MUTATING_INTENTS is the strict subset whose default sub-action is mutating. Only workspace.build and debug.download carry a structured sub-action (start / status / output / wait / cancel / evidence) and are listed in DEVICE_JOB_ACTIONABLE_INTENTS; for those two, _deviceJobRequiresReason(intent, args, reason) returns false when reason is provided OR when args.action is in DEVICE_JOB_READ_ONLY_SUB_ACTIONS. Any other mutating intent (workspace.apply-patch, workspace.put, debug.reset, io.uart.write, io.uart.jsonrpc, io.uart.read-after-launch-flash, etc.) still always requires a non-empty reason. Adding a new actionable mutating intent requires extending both MUTATING_INTENTS and DEVICE_JOB_ACTIONABLE_INTENTS together; adding a new read-only sub-action requires only the DEVICE_JOB_READ_ONLY_SUB_ACTIONS set.

The evidence sub-action on workspace.evidence / debug.evidence is a first-class intent, not a workspace.build sub-action. Code Agent sees <pod>:workspace:/ build evidence [jobId] and <pod>:debug-probe download evidence [jobId]; cloud-api maps to a new device-pod executor job, the executor maps to deviceHostArgs = ["workspace", "evidence", kind, ...], and the host-side device-host-cli.mjs dispatches via if (command === "evidence") at the top level of main() (not nested under if (command === "build")). workspace.evidence kind=build → keil-build job; debug.evidence kind=download → keil-download job; the kind sub-arg must be build / download and the optional jobId selects a specific past job.

Output text propagation chain

body.output.text flows through three layers in order; each layer tries more fields and only falls back when earlier sources are empty:

host device-host-cli.mjs returns a JSON envelope that already contains stdout / stderr / summary / logTail / buildSummary for build/download ops; workspace.ls / workspace.cat / workspace.rg are inline and include a JSON body.
executor cmd/hwlab-device-pod/main.ts gatewayDispatchText(result, dispatch) walks result.stdout → result.stderr → dispatch.stdout → dispatch.stderr → result.evidence.{text,logTail,summary} → dispatch.message (only when dispatchStatus === "completed") → dispatch.summary → result.summary → dispatch.buildSummary → result.text → JSON.stringify(result). The executor stores this as job.output and exposes it via boundedOutput() which clips at DEVICE_JOB_OUTPUT_MAX_BYTES (12000) and drops executor / nested output when truncated.
cloud-api executorOutputPayload(body, httpStatus) wraps what the executor sent and exposes body.text / body.output / body.bytes / body.truncation to the /v1/device-pods/{id}/jobs/{jobId}/output endpoint. text is firstString(body?.text, output.text, nestedOutput.text, output.summary, nestedOutput.summary, evidence.text, evidence.logTail, evidence.summary); the matched key is recorded by caller convention. executor payload stays on the response so callers can read dispatch.exitCode / dispatch.message / dispatch.stdout even when text is empty.

The evidence.* and *summary lookups exist so a dispatcher that already includes host logTail / buildSummary becomes visible without a separate bootsharp re-run on the Code Agent side. The summary lookups also keep error messages (dispatch.message) in the response even when dispatchStatus is not completed; this is the reason body.error.message always has something to show for failed host dispatches.

Cloud-api vs host-side boundary

/root/hwlab-v02/skills/device-pod-cli/assets/device-host-cli.mjs is the v0.2-shipped copy of the host-side CLI. The actual hardware host runs a separate F:\Work\ConStart\tools\device-host-cli.mjs that is not a deployment of the v0.2 repo; it is a D601 ops-side copy that must be synced manually when the v0.2 repo changes host-side behavior. The two-step contract is:

v0.2 cloud-api / executor changes are valid once PipelineRun Succeeded + git mirror flush complete; runtime revision is commit.id from /health/live and source commit can be forced to match via the next trigger-current.
v0.2 host-side device-host-cli.mjs changes are NOT visible until someone replaces F:\Work\ConStart\tools\device-host-cli.mjs on the D601 Windows host; cloud-api body.text will faithfully surface the "unsupported command" JSON error from the stale host binary, which proves the cloud-api propagation chain works but the host side is stale.

A live workspace.evidence / debug.evidence / download evidence selector that returns the host logTail end-to-end therefore requires both (a) the v0.2 PR merged and rolled, and (b) the D601 host binary replaced; missing either half is a known gap tracked in #779.

v0.2 device-pod closeout checks

Device-pod fixes still follow $dad-dev and the ## v0.2 Worktree + PR Workflow route above. The device-pod-specific closeout is the three-layer runtime matrix below; keep these checks because they prove the cloud-api -> executor -> D601 host chain, while generic PR/CI/CD and worktree mechanics stay in $dad-dev.

trans G14:/root/hwlab-v02/.worktree/<task> script -- 'cd tools && bun test device-pod-cli.test.ts'
trans G14:/root/hwlab-v02/.worktree/<task> script -- 'cd cmd/hwlab-device-pod && bun test main.test.ts'
trans G14:/root/hwlab-v02/.worktree/<task> script -- 'cd internal/cloud && bun test access-control.test.ts'
trans G14:/root/hwlab-v02/.worktree/<task> script -- 'node --check skills/device-pod-cli/assets/device-host-cli.mjs'

Treat access-control.test.ts workbench failures as pre-existing on the v0.2 base unless the new test list explicitly covers them. After PR merge and trigger-current --lane v02 --confirm, the live http://74.48.78.17:19667/ CLI 验收 must hit all three layers:

body.output.text non-empty for at least one happy-path intent (workspace.ls / workspace.cat are the cheapest ones to verify propagation without needing a real D601 build).
workspace.evidence kind=build / kind=download accepted by cloud-api, dispatched to executor, executor blocker === null and job.reason === "".
<mutating intent> action=status accepted without --reason while the same intent with action=start is still rejected with device_job_reason_required.

There is no separate device-pod doc; this section is the single authoritative reference for the architecture, and the AGENTS.md index points to it.

38 KiB Raw Blame History