docs: add PK01 operations reference

This commit is contained in:
Codex
2026-06-12 03:04:57 +00:00
parent 95eec5b773
commit 309e943bd5
3 changed files with 138 additions and 0 deletions
+1
View File
@@ -276,6 +276,7 @@ UniDesk 是一个以主 server 为统一入口的分布式工作平台;本文
- `docs/reference/code-queue-supervision.md`AgentRun Queue 与旧 Code Queue 指挥监督策略、并发窗口、轮询节奏、终态读取、阻塞拆分、PR handoff 和验收收口规则。
- `docs/reference/hwlab.md`HWLAB 指挥侧固定 workspace、G14 主运行面、D601 legacy/硬件桥接边界、最小 device-agent/gateway 桥接模型和受控发布边界。
- `docs/reference/g14.md`G14 provider 节点、k3s 控制桥、legacy DEV/PROD 退役边界、当前 HWLAB runtime lane、device-agent 手动实验边界、Code Queue/CI 候选目标和节点本地 VPN proxy bootstrap 边界。
- `docs/reference/pk01.md`PK01 腾讯云 provider-gateway、pikanode/MET Docker workload、SSH 透传、磁盘 GC 和 pikanode temp 长效 retention 边界。
- `docs/reference/platform-infra.md`G14 `platform-infra` namespace、YAML-first shared service 配置、Sub2API/Codex pool、FRP 暴露和 on-demand availability probe 开发边界;Sub2API 日常操作统一见 `$unidesk-sub2api``.agents/skills/unidesk-sub2api/SKILL.md`)。
- `docs/reference/master-server-ops.md`:主 server 本机 Codex profile wrapper、ACX/GOCX/Moon Bridge 路由边界、默认模型、真实调用验收和 MiniMax session recovery 规则。
- `docs/reference/g14-observability-infra.md`G14 原生 k3s 上 Prometheus Operator、`devops-infra` 监控基础设施、跨 namespace scrape 声明和安全边界。
+6
View File
@@ -70,6 +70,12 @@ UniDesk 的磁盘治理入口是 `bun scripts/cli.ts gc ...`。该入口用于
受限 core dump 只匹配 `/root/unidesk/core.<pid>` 普通文件。执行前必须重新校验路径 allowlist、Git 未跟踪、非 symlink、无 `fuser` 活跃引用。估算收益必须按实际分配块数计算,并可另行披露 `apparentSizeBytes`;不能把 sparse core dump 的表观大小当成可回收磁盘空间。
## Remote PK01 Policy
PK01 是腾讯云 Docker provider,不是 G14 k3s/registry 节点;长期运维边界见 `docs/reference/pk01.md``gc remote PK01 ...` 可用于通用低风险候选(allowlisted `/tmp`、Docker json-file 日志、BuildKit cache、apt cache、受限 core dump 和 journald 计划),但 pikanode 的主要增长源由 PK01 节点本地 retention 机制管理,而不是 G14 registry/PVC retention。
PK01 pikanode temp retention 只允许清理 `/home/ubuntu/pikanode/html/temp` 下超过保留窗口的直接子目录,并必须保护 `html/download/``html/upload/``files/`、证书、Git state、直接日志文件和近期 temp workspace。该策略已固化为 PK01 节点本地 systemd timer 与 logrotate;人工排障时优先查看 `systemctl status unidesk-pk01-pikanode-temp-gc.timer``/var/log/unidesk-pk01/pikanode-temp-gc.log`。如果 PK01 高水位仍无法通过 temp retention 和通用低风险 GC 降下来,必须停止并进入 pikanode 下载产物留存、Docker image retention 或容量扩容决策,不能把 `download/``files/` 或 Docker overlay 当作普通临时目录删除。
## HWLAB Registry Retention
G14 HWLAB registry 清理必须显式使用 `--include-hwlab-registry`,默认 `gc remote G14 plan` 不进入 registry。策略必须保守,不能只留 latest,也不能只删除 tag link 后误判已经释放空间。
+131
View File
@@ -0,0 +1,131 @@
# PK01 Provider Operations Reference
PK01 is a Tencent Cloud compute provider attached to UniDesk through `provider-gateway` with Provider ID `PK01`. This reference is the long-term operating boundary for PK01 host access, provider-gateway bootstrap state, pikanode retention, and disk GC. General provider-gateway rules remain authoritative in `docs/reference/provider-gateway.md`; general GC safety rules remain authoritative in `docs/reference/gc.md`.
## Operating Entry Points
Use UniDesk SSH passthrough for PK01 host operations:
```bash
trans PK01 argv hostname
trans PK01 script <<'SCRIPT'
df -h /
docker ps --format 'table {{.Names}}\t{{.Image}}\t{{.Status}}'
SCRIPT
```
Before closing an operation, verify both the provider channel and host workload state:
```bash
bun scripts/cli.ts debug health
trans PK01 argv bash -lc 'docker inspect --format "name={{.Name}} restart={{.HostConfig.RestartPolicy.Name}} pid={{.HostConfig.PidMode}} state={{.State.Status}} image={{.Config.Image}}" unidesk-provider-gateway-pk01'
trans PK01 argv bash -lc 'docker ps --format "table {{.Names}}\t{{.Image}}\t{{.Status}}"'
```
PK01 has no k3s control plane. `trans PK01:k3s ...` is not an operating truth. If a future PK01 k3s lane is introduced, it must get a separate runtime-lane reference and must not reuse the current pikanode host-data policy as a Kubernetes retention policy.
## Provider Gateway Bootstrap State
PK01 currently uses a direct Docker provider-gateway deployment rather than a full UniDesk source checkout. The node-local runtime bundle is:
| Item | Path / value | Boundary |
|---|---|---|
| Provider ID | `PK01` | Must stay unique in the UniDesk node registry. |
| Container | `unidesk-provider-gateway-pk01` | Must be `restart=always`, `pid=host`, and `running`. |
| Runtime bundle | `/home/ubuntu/unidesk-provider-pk01` | Minimal workspace mounted read-only into the gateway container. |
| Env file | `/home/ubuntu/.unidesk/state/provider-pk01/provider.env` | Contains provider token and must not be printed, copied into docs, or committed. |
| Host SSH key | `/home/ubuntu/.unidesk/host-ssh-pk01/id_ed25519` | Mounted read-only at `/run/host-ssh`; public key is authorized for `ubuntu`. |
| Logs | `/home/ubuntu/.unidesk/logs/provider-pk01` | Node-local runtime logs, not a Git source of truth. |
| Egress proxy | `127.0.0.1:18789` | Loopback only; never expose as a public endpoint. |
Long-term provider-gateway upgrades should converge to the standard `provider.upgrade mode=schedule` flow described in `docs/reference/provider-gateway.md`. If PK01 is still on the direct Docker bootstrap path, do not rebuild the gateway synchronously through the gateway's own `trans PK01` session. Use a detached node-local job or first move PK01 to the standard attach/upgrade bundle.
The minimal PK01 provider-gateway health contract is:
- `debug health` shows `providerId=PK01` as online.
- labels include `providerGatewayVersion`, `providerGatewayRuntimeGuardOk=true`, `providerGatewaySshDataTransport=tcp-pool`, and a nonzero ready SSH data pool.
- `trans PK01 argv hostname` reaches the Tencent Cloud host and returns the host name.
## Host Workloads
PK01 currently hosts existing Docker workloads:
| Container | Role | Protection boundary |
|---|---|---|
| `pikanode` | Public PikaPython/PikaNode service rooted at `/home/ubuntu/pikanode` | Do not delete source, `files/`, `html/download/`, `html/upload/`, certificates, or Git state without a service-owner retention decision. |
| `met_server` | Existing MET service | Treat as protected runtime unless a separate owner-approved retention plan exists. |
| `unidesk-provider-gateway-pk01` | UniDesk maintenance bridge | Must remain running; do not stop it as part of generic disk GC. |
`pikanode` mounts `/home/ubuntu/pikanode` read-write into the container. Static/generated download artifacts under `html/download/` and repository data under `files/` may be user-visible or needed by the service. They are not generic GC candidates.
## Disk GC Policy
PK01 follows the same safe-stop principle as G14: first produce a bounded attribution, then clean only classified candidates, and stop when remaining pressure is in protected runtime data.
Default sequence for a high-water incident:
1. Run generic remote GC plan and, if useful, confirmed run:
```bash
bun scripts/cli.ts gc remote PK01 plan --target-use-percent 60 --limit 100 --full
bun scripts/cli.ts gc remote PK01 run --confirm --target-use-percent 60 --limit 100 --full
```
2. Inspect PK01-specific host data with short passthrough commands; avoid full-root `du` in one `trans` call because `trans` has a 60 second hard timeout.
3. For pikanode growth, clean only `html/temp` direct child directories that are older than the configured node-local retention window. Preserve direct files such as `stdout.log`, `update.log`, `accesstoken.json`, `pullrequest.json`, and any recent temp workspaces.
4. Re-check `df -h /`, provider health, Docker container state, and a pikanode local HTTPS probe.
5. If the target still cannot be reached without touching `html/download/`, `files/`, Docker images, or other protected runtime data, stop and make a retention/capacity decision instead of widening deletion scope.
PK01 pikanode temp directories are safe to remove only under this narrow definition:
- path is a direct child directory of `/home/ubuntu/pikanode/html/temp`;
- path is not a symlink;
- parent is exactly `/home/ubuntu/pikanode/html/temp`;
- mtime is older than the configured retention window;
- deletion uses `rm -rf --one-file-system` and never follows paths outside that root.
Never use `rm -rf /home/ubuntu/pikanode/html/temp/*` as an unbounded shell expansion. It risks deleting current generation workspaces and direct state/log files.
## Long-Term Retention Mechanisms
PK01 has node-local retention controls installed so that pikanode temp output and logs do not grow without bound:
| Mechanism | Node-local path | Purpose |
|---|---|---|
| pikanode temp timer | `/etc/systemd/system/unidesk-pk01-pikanode-temp-gc.timer` | Runs pikanode temp retention on a daily timer. |
| pikanode temp service | `/etc/systemd/system/unidesk-pk01-pikanode-temp-gc.service` | Executes `/usr/local/sbin/unidesk-pk01-pikanode-temp-gc` as a one-shot cleanup. |
| pikanode temp script | `/usr/local/sbin/unidesk-pk01-pikanode-temp-gc` | Deletes only old direct temp directories under the protected root. |
| retention log | `/var/log/unidesk-pk01/pikanode-temp-gc.log` | Bounded operational evidence for the timer. |
| pikanode logrotate | `/etc/logrotate.d/unidesk-pk01-pikanode` | Rotates pikanode temp/runtime logs and the retention log. |
| journald cap | `/etc/systemd/journald.conf.d/99-unidesk-pk01.conf` | Caps systemd journal growth on PK01. |
Operational checks:
```bash
trans PK01 argv bash -lc 'systemctl status unidesk-pk01-pikanode-temp-gc.timer --no-pager'
trans PK01 argv bash -lc 'sudo systemctl start unidesk-pk01-pikanode-temp-gc.service && tail -n 40 /var/log/unidesk-pk01/pikanode-temp-gc.log'
trans PK01 argv bash -lc 'sudo logrotate -d /etc/logrotate.d/unidesk-pk01-pikanode'
```
The timer and logrotate configuration are node-local operational state. If a future UniDesk CLI subcommand manages PK01 retention centrally, it must first render a dry-run plan, show the same protected paths, and then install/update these node-local files through a confirmed operation.
## Space Attribution Baseline
PK01 space attribution should use short, bounded commands. Recommended probes:
```bash
trans PK01 argv bash -lc 'df -h / && df -i /'
trans PK01 argv bash -lc 'sudo timeout 20 du -xhd1 /var /home/ubuntu/pikanode /home/ubuntu/.vscode-server /var/lib/docker /var/log 2>/dev/null | sort -h | tail -80'
trans PK01 argv bash -lc 'docker system df -v | sed -n "1,220p"'
trans PK01 argv bash -lc 'sudo find /home/ubuntu/pikanode/html/temp -xdev -mindepth 1 -maxdepth 1 -printf "%TY-%Tm-%Td %TH:%TM %p\n" | sort | tail -40'
```
Interpretation guide:
| Path | Meaning | Default action |
|---|---|---|
| `/home/ubuntu/pikanode/html/temp` | Generated pikanode build workspaces | Managed by PK01 temp retention. |
| `/home/ubuntu/pikanode/html/download` | Generated ZIP downloads | Protected unless a separate download retention policy is approved. |
| `/home/ubuntu/pikanode/files` | pikanode repository/service data | Protected. |
| `/home/ubuntu/.vscode-server` | VS Code remote server, extensions, and cache | Do not delete installed servers/extensions by default; cached VSIX cleanup needs an explicit policy. |
| `/var/lib/docker` | Docker overlay/image/container state for PK01 workloads | Do not prune generically; inspect running containers first. |
| `/var/log/journal` | systemd journal | Managed by journald cap; use sudo when vacuuming manually. |