docs: 强化分布式敏捷实地验证规则

2026-05-29 08:35:35 +00:00
parent 95918b05e6
commit 6a56234b87
2 changed files with 12 additions and 0 deletions
@@ -48,6 +48,16 @@ If a manual repair is needed to unblock the platform, the durable fix must be co

 “分布式敏捷”是 UniDesk 对 distributed agile field repair 的固定流程名。后续 issue、PR、指挥记录或用户反馈提到“分布式敏捷”时，默认指下面这套流程：先在真实分布式运行面快速探测和实验补丁，形成可复现的证据与复盘 issue，再把有效修复收敛为 Git/PR/CI/CD 的持久化交付，最后从原始用户入口复测。它允许快速现场学习，但不允许运行面改动变成隐藏部署真相。

+Before classifying a failure as an external blocker, the operator must complete the field anti-misclassification check. This is P0 for model providers, API providers, hardware links, cross-platform bridges, CLI/tran paths and frequently used tooling:
+
+1. Confirm the exact runtime configuration used by the failing path: committed source ref, deployed image or script revision, redacted Secret names and key presence, env/proxy/NO_PROXY shape, endpoint identity and command args. Do not infer these values from memory or from a different workspace.
+2. Reproduce the symptom from the actual target provider, pod, host bridge or service port through UniDesk passthrough or the service entry that failed. A commander-machine-only check is supporting evidence, not classification evidence.
+3. Compare with the mature local implementation when one exists. For Codex/model-provider work, inspect the current UniDesk/HWLAB stdio, forwarder, proxy, env-stripping and config-loading paths before concluding the provider itself is broken.
+4. Run narrow one-variable experiments in the live target environment. Typical variables are explicit versus config-derived model, endpoint, proxy or NO_PROXY, env inheritance, secret mount shape, CLI version, protocol start parameters and request payload. Record the success case and the failure case with trace ids, run ids, job names, rollout objects or bounded logs.
+5. Only call the condition an external blocker after the current runtime config has been verified, the minimal real-path probe still fails, a mature reference path or equivalent cross-check also fails, and the evidence rules out local adapter/config mistakes.
+
+If user feedback or fresh evidence contradicts an initial blocker claim, the operator must stop repeating the blocker narrative and switch to field repair mode immediately. The expected sequence is passthrough probing, single-variable live experiments, a bounded hotfix experiment when needed, a source PR, CI/CD rollout and re-test from the original entry point. The hotfix proves direction or restores a live path; it does not complete the task.
+
 The standard flow is:

 1. Probe the real runtime surface first. Use structured UniDesk passthrough, service health endpoints, trace/result polling, bounded logs, object metadata and user-entry requests to reproduce the symptom on the actual target environment. Prefer short single-step commands that return promptly and can be repeated.