Files
pikasTech-unidesk/docs/reference/auth-broker.md
T
Lyon 55c1b296ff fix: classify transient GitHub connectivity
Merge PR #102 after rebasing onto current master and validating focused GitHub transient/Code Queue preflight behavior. GitHub DNS/API failures now classify as retryable github-transient with retry/backoff or keep-fresh-heartbeat-task guidance.
2026-05-23 15:49:29 +08:00

16 KiB
Raw Blame History

Auth Broker P0 Contract

本文收口 GitHub issue/PR 权限注入的 P0 方案。P0 目标不是建设完整 IAM,也不是把 secret 分发给 runner;目标是先在 D601 dev 验证一个 Rust 单二进制 broker proxy,让 Code Queue runner 即使没有 GH_TOKEN / GITHUB_TOKEN,也能通过受控代理完成 GitHub issue/PR 访问和 PR preflight dry-run。

Existing Paths

当前阻塞来自多个路径都把 GitHub 能力等同于 runner 环境变量:

  • scripts/src/gh.tsUniDesk gh CLI 已经是 repo-native GitHub REST wrapper,优先读取 GH_TOKEN,其次读取 GITHUB_TOKEN,再 fallback 到系统 gh auth token。它不会打印 token 值,并已把 missing-tokenpermission-deniedscope-insufficientgithub-transientnetwork-proxy-failedunsupported-command 等失败结构化;其中 github-transient 是可重试的 GitHub DNS/API 连接失败,不等同于缺 token 或权限不足。
  • scripts/code-queue-pr-preflight-example.ts:本地示例先检查 env token,再跑 gh auth statusgh pr create --dry-rungh pr comment --dry-run。没有 env token 时该 workflow 直接失败。
  • src/components/microservices/code-queue/src/runtime-preflight.tsD601 scheduler preflight 检查 GH_TOKEN / GITHUB_TOKEN 是否存在,用 curl/gh/git 探测 GitHub API、issue、PR、SSH 和可选 push dry-run。
  • scripts/src/code-queue.tscodex pr-preflight 把 runtime preflight 压缩为 runner 合同;scheduler env token 与 active runner/dev container token 是独立 scope。缺少 scheduler env token 或 auth-broker 时输出 failureKind=auth-missingdegradedReason=auth-broker-neededrunnerDisposition=infra-blocked,并用 authScopeSummary/scopeBoundary/recommendedActions 明确这不等于当前 active runner 不能创建 PRactive task 内仍需用 repo-native bun scripts/cli.ts gh auth status 和 PR create dry-run 验证。
  • src/components/microservices/code-queue/src/index.tsCODE_QUEUE_REMOTE_CODEX_ENV_KEYS 默认包含 GH_TOKENGITHUB_TOKENGH_HOSTGITHUB_API_URLGH_REPO,用于把 scheduler env 继续传给 provider dev container。这是凭证分发路径,不适合作为 P0 的唯一解。
  • src/components/microservices/code-queue/docker-compose.d601.ymlD601 Code Queue 从 .state/code-queue-d601.env 读取运行时环境,并配置 provider-gateway egress proxy。该文件不能承载本任务中的真实 secret 值。
  • src/components/microservices/code-queue/Dockerfilerunner 镜像已经包含 gitghcurljqrgcargorustcrustfmt,足够执行轻量 broker client、preflight 和 contract test;本 P0 不需要为了调研临时安装新系统包。

Artifact registry / deploy 路径的问题不同:D601 registry 是 host loopback 的 artifact cache,标准 CD 先验证 commit-pinned manifest,再 pull/import/retag/rollout。P0 auth broker 不代理 Docker registry credential、不替代 provider-gateway Host SSH、不执行 deploy apply,也不解决 production registry 发布授权。

P0 Scope

P0 只解决 GitHub REST 权限不应出现在普通 runner env 中的问题:

  • 新增 Rust 单二进制服务 skeleton,路径为 src/components/microservices/auth-broker,工作名 auth-broker
  • 新增 CLI adapter contract,入口为 bun scripts/cli.ts auth-broker contract|health --dry-run|credential-request --dry-run|pr-preflight --dry-run
  • 先只在 D601 dev 验证,入口只能是 k3s ClusterIP、backend-core/microservice 私有代理或 D601 loopback,不开放公网端口。
  • broker 持有服务端 GitHub 凭证引用并调用 GitHub REST;runner 不接收、不读取、不打印 GH_TOKEN / GITHUB_TOKEN
  • API 只接受结构化 operation,不接受 shell、argv、任意 URL 或原始 gh api
  • P0 不实现真实 secret 存储、轮换、用户身份绑定、细粒度授权、production rollout 或 registry/deploy 凭证代理。

P0 可以让 Code Queue 并行推进,但必须把实现拆成互不冲突的 laneRust API skeleton、CLI client adapter、runtime-preflight contract、D601 dev manifest/dry-run、文档和 contract test。真实凭证配置、D601 dev Secret 创建、production 启用和 live write allowlist 需要人工确认。

API

The first skeleton lives at:

  • src/components/microservices/auth-broker/Cargo.toml
  • src/components/microservices/auth-broker/src/main.rs
  • src/components/microservices/auth-broker/Dockerfile
  • config.json microservice id auth-broker
  • deploy.json prod/dev desired-state entries for auth-broker
  • docker-compose.yml service auth-broker behind Compose profile auth-broker
  • scripts/src/auth-broker.ts

The skeleton intentionally does not read GH_TOKEN or GITHUB_TOKEN. It uses only redacted readiness configuration such as AUTH_BROKER_GITHUB_CONFIGURED, AUTH_BROKER_GITHUB_CREDENTIAL_REF, AUTH_BROKER_ALLOWED_REPOS and optional AUTH_BROKER_AUDIT_LOG. Real secret mounting is outside this contract.

P1 Source Registration

P1 keeps Auth Broker in source/contract/dry-run only:

  • config.json registers stable microservice id auth-broker on main-server, private backend http://auth-broker:4291, health path /health, and allowed proxy prefixes /health plus /v1/github/.
  • docker-compose.yml defines service auth-broker with profiles: ["auth-broker"], restart: "no", no public ports, and redacted env names only. Default server start does not select this profile, so this source registration must not change current production runtime.
  • deploy.json includes prod and dev desired-state entries so deploy plan --env prod|dev --service auth-broker has a stable identity. Live apply is supervisor-gated until credential mounting and private exposure are separately reviewed.
  • bun scripts/cli.ts auth-broker contract|health --dry-run|credential-request --dry-run|pr-preflight --dry-run reports serviceRegistration.config, serviceRegistration.compose, serviceRegistration.deploy, and serviceRegistration.runtimeCredentialRef using presence/ref fields only.
  • Runtime credential readiness is expressed by UNIDESK_AUTH_BROKER_GITHUB_CONFIGURED / AUTH_BROKER_GITHUB_CONFIGURED and UNIDESK_AUTH_BROKER_GITHUB_CREDENTIAL_REF / AUTH_BROKER_GITHUB_CREDENTIAL_REF presence. The CLI prints only the source key and a sanitized github:<ref> style preview, never a token or raw credential value.

P1 still does not start Auth Broker, mount real secrets, deploy to prod/dev, restart backend-core/provider-gateway/Code Queue, or proxy registry/deploy credentials.

GET /health

只返回服务状态和 redacted capability,不返回 secret 值。

{
  "ok": true,
  "service": "auth-broker",
  "phase": "p0",
  "github": {
    "configured": true,
    "credentialRef": "github:unidesk-dev",
    "valuesPrinted": false
  },
  "capabilities": ["github.issue.read", "github.pr.read", "github.pr.preflight.dry-run"]
}

POST /v1/github/gh

通用 GitHub issue/PR proxy。请求必须是结构化 JSON:

{
  "requestId": "uuid",
  "caller": {
    "plane": "code-queue",
    "taskId": "codex_...",
    "queueId": "default"
  },
  "repo": "pikasTech/unidesk",
  "operation": "github.issue.read",
  "dryRun": false,
  "params": {
    "number": 59,
    "jsonFields": ["body", "title", "state"]
  }
}

P0 required operation allowlist:

Operation Upstream Remote write P0 status
github.auth.status GitHub REST rate limit + repo probe no enabled
github.issue.list GET /repos/{owner}/{repo}/issues no enabled
github.issue.read GET /repos/{owner}/{repo}/issues/{number} no enabled
github.pr.list GET /repos/{owner}/{repo}/pulls no enabled
github.pr.read GET /repos/{owner}/{repo}/pulls/{number} no enabled
github.pr.create planned request only when dryRun=true no enabled as dry-run
github.pr.comment.create planned request only when dryRun=true no enabled as dry-run

P0 explicitly blocks gh pr merge, issue/PR delete, arbitrary gh api, arbitrary HTTP proxying, raw git push, Docker registry login and deploy commands. Live GitHub writes can be added later only after a user-confirmed allowlist and identity/audit review; without that confirmation P0 mutation operations must return dry-run-required.

POST /v1/github/pr-preflight

Runner-facing PR preflight proxy. It must preserve the existing codex pr-preflight semantics while replacing env-token coverage with broker coverage:

{
  "requestId": "uuid",
  "repo": "pikasTech/unidesk",
  "base": "master",
  "head": "feature/example",
  "issueNumber": 59,
  "includePrCreateDryRun": true,
  "includePushDryRun": false
}

Successful response shape:

{
  "ok": true,
  "runnerDisposition": "ready",
  "failureKind": null,
  "degradedReason": null,
  "tokenCoverage": {
    "ok": true,
    "source": "auth-broker",
    "scope": "broker-held-github-credential",
    "runnerEnvTokenRequired": false,
    "valuesPrinted": false
  },
  "prCapabilityContract": {
    "targetBranch": "master",
    "systemGhBinaryRequiredForWrites": false,
    "preflightCreatesPr": false,
    "preflightMergesPr": false,
    "brokerProxy": {
      "ok": true,
      "operations": ["github.auth.status", "github.pr.create"],
      "writesRemote": false
    }
  }
}

Broker PR preflight can prove GitHub REST auth, repo visibility, issue/PR read, and PR create body/parameter shape. It cannot prove runner-local git push capability unless the runner still performs git push --dry-run with its own git credentials. Therefore includePushDryRun=true remains a runner-local check and may still fail as git-remote-gap.

Permission Boundary

P0 permission is intentionally coarse because identity verification is not in scope:

  • Allowed repo list defaults to pikasTech/unidesk; unknown repos return repo-not-allowed.
  • Allowed operations are finite strings. The broker never executes caller-provided shell, argv or URLs.
  • Request body size is bounded. Markdown bodies are accepted only for planned dry-run output in P0.
  • Response redaction is mandatory. No response, log or audit field may contain token values, Authorization headers, cookie values or upstream credential material.
  • Broker-held credential is identified only by credentialRef, credentialKind and boolean readiness.
  • Network exposure is dev-only and private. Public frontend access must go through existing authenticated UniDesk proxy if exposed at all.
  • P0 does not grant production deploy, registry push/pull, provider token, database, k3s or host SSH permissions.

Audit Fields

Every broker request must emit one JSONL audit record with these fields:

Field Meaning
requestId Caller-provided or broker-generated id
observedAt ISO timestamp
caller.plane code-queue, ci, commander, manual-cli or unknown
caller.taskId / caller.queueId Code Queue correlation, nullable
operation One allowlisted operation string
repo Owner/repo after allowlist validation
resource Issue/PR number or branch names, no body text
dryRun Whether upstream mutation is impossible
credentialRef Stable redacted credential reference
credentialValuePrinted Always false
upstream.method / upstream.path GitHub REST method/path without query secrets
status HTTP status returned to caller
ok Boolean success
failureKind / degradedReason Structured failure semantics
runnerDisposition ready, infra-blocked or business-failed
retryable Boolean retry hint
durationMs End-to-end broker latency
redaction.valuesPrinted Always false

Audit records may include body length and SHA-256 for planned PR/issue bodies, but must not store full Markdown bodies by default.

Failure Semantics

P0 must use stable failure kinds so Code Queue can decide whether to retry, split blocker work or ask for manual action.

Failure kind HTTP Runner disposition Retryable Meaning
auth-not-configured 503 infra-blocked false Broker has no configured GitHub credential reference
broker-unavailable 503 infra-blocked true Broker service/proxy is unreachable
unauthorized-caller 403 infra-blocked false Caller is outside the private dev/proxy boundary
repo-not-allowed 403 business-failed false Repo is not in the broker allowlist
operation-not-allowed 403 business-failed false Operation is not in the finite allowlist
dry-run-required 409 business-failed false Mutation was requested but P0 only allows dry-run
validation-failed 400 business-failed false Missing or invalid structured parameters
github-egress-failed 502 infra-blocked true GitHub or proxy network path failed
github-rate-limited 429 infra-blocked true GitHub returned rate limiting
github-permission-denied 403 infra-blocked false Credential lacks repo or action access
scope-insufficient 403 infra-blocked false Accepted scopes do not satisfy the operation
repo-not-found 404 business-failed false Allowed repo/resource does not exist or is hidden
upstream-invalid-response 502 infra-blocked true GitHub response could not be parsed safely

All failures must include message, requestId, failureKind, degradedReason, runnerDisposition, retryable, and a next array with bounded diagnostic commands or manual actions. None may include secret values.

CLI Adapter

The local runner adapter is a dry-run contract surface only:

bun scripts/cli.ts auth-broker contract
bun scripts/cli.ts auth-broker health --dry-run
bun scripts/cli.ts auth-broker credential-request --operation github.pr.create --repo pikasTech/unidesk --dry-run
bun scripts/cli.ts auth-broker pr-preflight --repo pikasTech/unidesk --base master --head <head-branch> --issue 59 --dry-run

If no UNIDESK_AUTH_BROKER_URL / AUTH_BROKER_URL is configured, the adapter returns a structured failure instead of falling through to live GitHub or a shell fallback. GH_TOKEN / GITHUB_TOKEN presence is reported only as migration diagnostics and does not make the Auth Broker adapter ready:

{
  "ok": false,
  "failureKind": "auth-missing",
  "degradedReason": "broker-needed",
  "runnerDisposition": "infra-blocked",
  "brokerNeeded": true,
  "tokenCoverage": {
    "ok": false,
    "presenceOnly": true,
    "valuesRead": false,
    "valuesPrinted": false
  }
}

When a broker endpoint is configured, the same command returns the P0 ready shape with tokenCoverage.source=auth-broker, runnerEnvTokenRequired=false, valuesPrinted=false, preflightCreatesPr=false, preflightMergesPr=false and brokerProxy.writesRemote=false. The adapter sanitizes endpoint URLs before printing and never reads token values.

D601 Dev Acceptance

The minimum D601 dev verification is:

  1. Code Queue scheduler has no GH_TOKEN / GITHUB_TOKEN requirement for PR preflight success when broker is configured.
  2. codex pr-preflight --remote reports tokenCoverage.source=auth-broker, runnerEnvTokenRequired=false and valuesPrinted=false.
  3. Broker-backed issue/PR read returns structured GitHub data for pikasTech/unidesk through the stable proxy.
  4. Broker-backed PR create dry-run returns a planned operation with writesRemote=false.
  5. includePushDryRun=true is clearly reported as runner-local and can fail independently as git-remote-gap.
  6. Logs and audit records contain request ids, operation names and failure semantics, but no token values.
  7. No deploy, restart, production mutation, registry credential proxy or long-lived extra control plane is introduced by the verification itself.

Manual Confirmation Points

These items are outside P0 automation and require user or operator confirmation:

  • Where the broker-held GitHub credential reference is mounted in D601 dev.
  • Whether P0 permits any live GitHub write. Default is read-only plus dry-run.
  • Whether Code Queue CLI should default to broker mode when env token is absent or require an explicit --auth-broker flag first.
  • Production exposure, production credentials, rotation, identity binding and per-user authorization.
  • Any deploy, restart, registry credential, provider token, database or k3s permission integration.