feat: add real dependency proxy benchmark

This commit is contained in:
Codex
2026-06-26 16:34:23 +00:00
parent 9bce21b9ef
commit 678dc427d6
4 changed files with 562 additions and 30 deletions
@@ -0,0 +1,61 @@
# PJ2026-01060310 Real K3s Dependency Proxy Benchmark
## Scope
This SPEC covers pikasTech/unidesk#1048. It supersedes synthetic Cloudflare download evidence for proxy acceleration decisions and adds a real k3s dependency benchmark profile named `real-deps-500m`.
The benchmark must prove the target k3s cluster can use the platform-infra egress proxy for real dependency acquisition. It has four required stages:
- Kubernetes image pull: kubelet/containerd must directly pull remote `alpine`, `node`, and `golang` images with `imagePullPolicy: Always`.
- Pod `apk add`: the Alpine stage must fetch packages from upstream apk repositories through proxy environment variables.
- Pod `npm install`: the Node stage must install packages from `https://registry.npmjs.org/` through the proxy.
- Pod `go mod download`: the Go stage must download modules through `GOPROXY=https://proxy.golang.org,direct` and the proxy.
If the Kubernetes image pull stage fails, the benchmark result is not an application dependency failure; it is an image-pull proxy failure in the k3s/containerd path and must be fixed there.
## Architecture
`platform-infra egress-proxy k3s-build-benchmark` remains the single coordinator. It reads targets from `config/platform-infra/sub2api.yaml`, reads profiles from `config/platform-infra/egress-proxy-benchmarks.yaml`, renders one Job per target, and uses `trans <target.route> sh -- ...` as the bounded control path.
The `real-deps-500m` profile renders a multi-stage Kubernetes Job:
- `initContainer/apk-add`: image `docker.io/library/alpine:3.20`.
- `initContainer/npm-install`: image `docker.io/library/node:22-bookworm`.
- `initContainer/go-download`: image `docker.io/library/golang:1.24-bookworm`.
- `container/summary`: emits a bounded result marker after all init containers finish.
All three init containers receive the YAML-declared `sub2api-egress-proxy` service URL through `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, and lowercase variants. The image pull itself happens before Pod process execution; therefore image pull proxy evidence must come from the k3s/containerd path and proxyserver-side traffic sampling, not from in-container env alone.
## Observability
The source of truth for traffic is `platform-infra egress-proxy traffic --target <node> --sample-seconds N`. Benchmark status may include this traffic sample. The final evidence table must include proxyserver window bytes/rate/cumulative bytes, top client, and top destination.
For image pull traffic, the observed proxy client may be the node/k3s/containerd path rather than the benchmark Pod IP. For `apk`, `npm`, and `go` stages, the observed proxy client should correspond to the benchmark Pod network path. This distinction must be preserved in issue evidence.
Status output must classify failures into at least:
- `image-pull`: kubelet/containerd cannot pull remote images.
- `apk-download`: Pod started but apk fetch/install failed.
- `npm-download`: Pod started but npm install failed.
- `go-download`: Pod started but Go module download failed.
- `none`: all stages succeeded.
## Boundaries
This benchmark must not:
- use Cloudflare speed-test downloads as acceptance evidence;
- install Node or Go only as a substitute for Kubernetes pulling `node`/`golang` images;
- rewrite apk/npm/go sources to regional mirrors;
- use HWLAB source repositories, Tekton, Argo, git-mirror, or previous build caches;
- hide image pull failures behind local image overrides.
`payloadMiB: 500` in the `real-deps-500m` profile means the minimum proxyserver-observed traffic required for acceptance. The Pod result marker may report apk/npm/go workspace sizes, but those sizes do not replace proxyserver traffic evidence because image pull bytes are outside the Pod filesystem.
## Acceptance
- `bun scripts/cli.ts platform-infra egress-proxy k3s-build-benchmark --targets D601,D518 --profile real-deps-500m --dry-run` prints both node plans and the remote image set.
- `--confirm` creates one unique Job per node and returns immediately.
- `status --traffic-sample-seconds 15` reports Job state, `image-pull`/`apk`/`npm`/`go` failure family when applicable, and proxyserver traffic columns.
- D601 and D518 both have final rows with target, profile, job, state, duration, apk MiB, npm MiB, go MiB, proxy traffic window/rate/cumulative, top client, top destination, and failure family.
- Acceptance requires at least 500 MiB of proxyserver-observed traffic per successful node run. If a node cannot reach that point because image pull fails, the issue remains open until the k3s/containerd image pull proxy path is fixed or a blocker is explicitly documented.