diff --git a/docs/reference/cli.md b/docs/reference/cli.md index 099b9923..f6d92b3c 100644 --- a/docs/reference/cli.md +++ b/docs/reference/cli.md @@ -22,7 +22,7 @@ CI/CD、GitOps、rollout、artifact 发布、PR 合并后的 runtime lane 滚动 G14/D601 v03 的 bootstrap admin password 是 HWLAB runtime Secret 生命周期的一部分,必须收敛到 `config/hwlab-node-lanes.yaml` 的 `bootstrapAdmin` 声明与受控 `hwlab nodes secret status|ensure --node --lane v03 --name hwlab-v03-bootstrap-admin` CLI。明文只能存在于 Git 忽略、owner-only 的 `.state/secrets/...` sourceRef 文件;CLI 在本地把明文转换为 HWLAB 兼容 password hash,只向运行面同步 `password-hash`,并在输出中只披露 sourceRef、sourceKey、target Secret/key、presence、byte count、fingerprint、mutation 与后续命令。`secret ensure --force` 只用于明确需要按 YAML sourceRef 重灌 bootstrap admin hash 并重启 Cloud API 的受控恢复场景,默认 ensure 不做强制重灌;不要把人工生成 hash、手工写 k8s Secret 或原生 `kubectl rollout` 沉淀为长期入口。 -`hwlab nodes web-probe run --node --lane [--url ]` 是 HWLAB Cloud Web DOM probe 的受控指挥入口。它从 `config/hwlab-node-lanes.yaml` 解析目标 workspace、public URL 和 bootstrap admin sourceRef,在 UniDesk 指挥侧读取 owner-only 明文后只通过一次性 stdin/env 注入目标 workspace 的 `scripts/web-live-dom-probe.mjs`;stdout 只披露 sourceRef、sourceKey、presence、fingerprint、注入方式、DOM 摘要和 artifact hash,不打印密码。缺少 sourceRef 或 source 文件时应结构化返回 `web_login_secret_missing`,不能回退历史默认密码或要求把 secret 复制到 D601/G14 目标 host。需要自定义 Playwright route/intercept、in-flight DOM 读取或专用截图时,使用 `hwlab nodes web-probe script --node --lane <<'JS' ... JS`,由 CLI 负责同一 sourceRef 凭据解析、`/auth/login` 建立 `hwlab_session`、已认证 `browser/context/page/baseUrl` 注入和 artifact path/hash 摘要;自定义脚本不得自行读取或打印 Web 登录凭据。`web-probe script` 托管登录先对同源 `/auth/login` 做短重试;仍未拿到 `hwlab_session` 时自动回到当前 Cloud Web 登录表单,以浏览器方式提交同一凭据。`probe.auth` 只输出 method、status、attempts、fallback 和 redacted errorSummary,不打印密码、cookie 或可复制 session 值。 +`hwlab nodes web-probe run --node --lane [--url ]` 是 HWLAB Cloud Web DOM probe 的受控指挥入口。它从 `config/hwlab-node-lanes.yaml` 解析目标 workspace、public URL 和 bootstrap admin sourceRef,在 UniDesk 指挥侧读取 owner-only 明文后只通过一次性 stdin/env 注入目标 workspace 的 `scripts/web-live-dom-probe.mjs`;stdout 只披露 sourceRef、sourceKey、presence、fingerprint、注入方式、DOM 摘要和 artifact hash,不打印密码。缺少 sourceRef 或 source 文件时应结构化返回 `web_login_secret_missing`,不能回退历史默认密码或要求把 secret 复制到 D601/G14 目标 host。需要自定义 Playwright route/intercept、in-flight DOM 读取或专用截图时,使用 `hwlab nodes web-probe script --node --lane <<'JS' ... JS`,由 CLI 负责同一 sourceRef 凭据解析、`/auth/login` 建立 `hwlab_session`、已认证 `browser/context/page/baseUrl` 注入和 artifact path/hash 摘要;自定义脚本不得自行读取或打印 Web 登录凭据。`web-probe script` 托管登录先对同源 `/auth/login` 做短重试;仍未拿到 `hwlab_session` 时自动回到当前 Cloud Web 登录表单,以浏览器方式提交同一凭据。`probe.auth` 只输出 method、origin、loginPath、status、attempts、retryCount、fallbackUsed、fallback、retryable、transientObserved、fingerprint、commanderAction 和 redacted errorSummary,不打印密码、cookie 或可复制 session 值。 `web-probe script` 的默认 `goto('/workbench')` 是稳定导航边界:它会先复用当前 page,失败后有限次切 fresh page 重试,并等待 workbench 基础 DOM(默认 `#workspace` 和 `#command-input`)可见;需要显式控制时使用注入的 `gotoStable(target, { selectors, activeSelector, attempts, readinessTimeoutMs })`、`waitForReady({ selectors })`、`gotoRaw()` 和 `getPage()`。稳定化失败必须在 `probe.readiness` 中低噪声披露 attempt、阶段、selector、是否观察到 `/v1` API request、API failure 摘要和失败截图 artifact;分类值固定为 `browser-load-jitter`、`selector-timeout`、`api-not-sent`、`api-response-failed`,避免把“页面没准备好/请求未发出”和“后端响应失败”混成同一种 selector timeout。runner 不在用户脚本执行前抢先导航同一 page,保证脚本仍可先安装 `page.route` 或 context route;如重试切换 fresh page,后续脚本应通过 `gotoStable()` 返回值或 `getPage()` 取得当前 page。 diff --git a/docs/reference/hwlab.md b/docs/reference/hwlab.md index 9bb1633a..249d5316 100644 --- a/docs/reference/hwlab.md +++ b/docs/reference/hwlab.md @@ -92,7 +92,7 @@ Session 切换、session rail 或 Workbench 恢复路径类问题必须同时验 Workbench session URL 是恢复路径的一部分。每个可见或当前选中的 session 都应能映射到 `/workbench/sessions/`,`/workspace/sessions/` 仅作为兼容 alias;从 `/workbench` 进入、创建 session 或点击 session rail 后,地址栏应反映当前 active `conversationId`,复制 session 信息时也应包含可直接打开的 session URL。直接打开 deep link 时,Cloud Web 静态层必须返回 SPA `index.html` 而不是 404;登录 redirect 必须保留原 path;登录后或 hydrate 后应通过 `/v1/agent/conversations/` 与 workspace select 路径把后端 `selectedConversationId` 收敛到 URL 中的 id。关闭 session URL/恢复类 issue 时,浏览器证据至少包含最终 URL、workspace API 的 `selectedConversationId`、active tab 判断,以及一次无 cookie direct GET 或等价探针确认 deep link 是 `200 text/html`。 -Cloud Web 登录页的中文错误可能会把 API upstream 502、rollout 中间态或真实 401 都表现成登录失败。遇到登录失败先用目标 public origin probe `/health/live`、`/auth/login` 状态和选中 namespace 的 API/Web/edge-proxy rollout;只有 API 已 ready 且 `/auth/login` 明确返回 401 时,才把它归类为凭据或用户状态问题。rollout 瞬态恢复后重跑同一短生命周期 Playwright 验收即可,不要把 transient `upstream_unavailable` 写成长期功能缺陷。 +Cloud Web 登录页的中文错误可能会把 API upstream 502、rollout 中间态或真实 401 都表现成登录失败。遇到登录失败先看 `web-probe script` 的 `probe.auth.retryCount`、`transientObserved`、`retryable`、`fallbackUsed`、`fingerprint` 和 `commanderAction`,再用目标 public origin probe `/health/live`、`/auth/login` 状态和选中 namespace 的 API/Web/edge-proxy rollout;只有 API 已 ready 且 `/auth/login` 明确返回 401 时,才把它归类为凭据或用户状态问题。rollout 瞬态恢复后重跑同一短生命周期 Playwright 验收即可,不要把 transient `upstream_unavailable` 写成长期功能缺陷。 ## HWLAB FRP 维护 diff --git a/scripts/src/hwlab-node.ts b/scripts/src/hwlab-node.ts index d6d63358..75aee89c 100644 --- a/scripts/src/hwlab-node.ts +++ b/scripts/src/hwlab-node.ts @@ -3810,7 +3810,8 @@ async function authenticate(browserContext) { async function authenticateWithApiRetries(browserContext) { const loginUrl = new URL("/auth/login", baseUrl).toString(); const attempts = []; - for (let attempt = 1; attempt <= 3; attempt += 1) { + const maxAttempts = 3; + for (let attempt = 1; attempt <= maxAttempts; attempt += 1) { try { const response = await browserContext.request.post(loginUrl, { data: { username, password }, @@ -3819,10 +3820,12 @@ async function authenticateWithApiRetries(browserContext) { }); const summary = await responseSummary(response); const cookieState = await readAuthCookieState(browserContext); + const retryable = isRetryableAuthStatus(response.status()); const item = { attempt, method: "api", ...summary, + retryable, cookiePresent: cookieState.cookiePresent, cookieNames: cookieState.cookieNames, }; @@ -3837,21 +3840,24 @@ async function authenticateWithApiRetries(browserContext) { cookiePresent: true, cookieNames: cookieState.cookieNames, attempts, + retryCount: attempt - 1, + fallbackUsed: false, valuesRedacted: true, }; } - if (response.status() < 500 && response.status() !== 429) break; + if (!retryable) break; } catch (error) { attempts.push({ attempt, method: "api", status: 0, statusText: "request-error", + retryable: true, error: error instanceof Error ? error.message : String(error), cookiePresent: false, }); } - if (attempt < 3) await sleep(300 * attempt); + if (attempt < maxAttempts) await sleep(300 * attempt); } const cookieState = await readAuthCookieState(browserContext); const last = attempts[attempts.length - 1] ?? null; @@ -3864,6 +3870,9 @@ async function authenticateWithApiRetries(browserContext) { cookiePresent: cookieState.cookiePresent, cookieNames: cookieState.cookieNames, attempts, + retryCount: retryCountFromAttempts(attempts), + fallbackUsed: false, + retryable: authAttemptsRetryable(attempts), errorSummary: last, valuesRedacted: true, }; @@ -3896,6 +3905,9 @@ async function authenticateWithFormFallback(browserContext, apiAuth) { cookiePresent: cookieState.cookiePresent, cookieNames: cookieState.cookieNames, attempts: apiAuth.attempts, + retryCount: apiAuth.retryCount ?? retryCountFromAttempts(apiAuth.attempts), + fallbackUsed: true, + retryable: apiAuth.retryable === true, apiErrorSummary: apiAuth.errorSummary, fallback, valuesRedacted: true, @@ -3938,6 +3950,9 @@ async function authenticateWithFormFallback(browserContext, apiAuth) { cookiePresent: cookieState.cookiePresent, cookieNames: cookieState.cookieNames, attempts: apiAuth.attempts, + retryCount: apiAuth.retryCount ?? retryCountFromAttempts(apiAuth.attempts), + fallbackUsed: true, + retryable: apiAuth.retryable === true, apiErrorSummary: apiAuth.errorSummary, fallback, valuesRedacted: true, @@ -3955,6 +3970,9 @@ async function authenticateWithFormFallback(browserContext, apiAuth) { cookiePresent: cookieState.cookiePresent, cookieNames: cookieState.cookieNames, attempts: apiAuth.attempts, + retryCount: apiAuth.retryCount ?? retryCountFromAttempts(apiAuth.attempts), + fallbackUsed: true, + retryable: apiAuth.retryable === true, apiErrorSummary: apiAuth.errorSummary, fallback, valuesRedacted: true, @@ -4451,6 +4469,19 @@ function boundedInteger(raw, fallback, min, max) { return Math.max(min, Math.min(max, value)); } +function isRetryableAuthStatus(status) { + return status === 0 || status === 429 || status >= 500; +} + +function retryCountFromAttempts(attempts) { + return Math.max(0, Array.isArray(attempts) ? attempts.length - 1 : 0); +} + +function authAttemptsRetryable(attempts) { + if (!Array.isArray(attempts) || attempts.length === 0) return false; + return attempts.some((attempt) => attempt && typeof attempt === "object" && attempt.retryable === true); +} + async function responseSummary(response) { const status = response.status(); let bodyPreview = null; @@ -4498,22 +4529,52 @@ function sleep(ms) { } function publicAuth(value) { + const retryCount = Number.isInteger(value.retryCount) ? value.retryCount : retryCountFromAttempts(value.attempts); + const transientObserved = authAttemptsRetryable(value.attempts); + const ok = value.ok === true; + const retryable = ok ? false : value.retryable === true || transientObserved; return { - ok: value.ok, + ok, method: value.method ?? null, + origin: new URL(baseUrl).origin, loginPath: value.loginUrl, status: value.status, statusText: value.statusText, cookiePresent: value.cookiePresent, cookieNames: value.cookieNames, attempts: value.attempts ?? null, + retryCount, + fallbackUsed: value.fallbackUsed === true || value.method === "form-fallback", fallback: value.fallback ?? null, errorSummary: cloneSummary(value.errorSummary ?? value.apiErrorSummary ?? null), + degradedReason: ok ? null : "auth-login-failed", + retryable, + transientObserved, + commanderAction: ok + ? null + : retryable + ? "retry same web-probe command after short backoff; inspect target /auth/login and Cloud Web/API rollout if repeated" + : "inspect bootstrap admin credential source and target user state", + fingerprint: authSummaryFingerprint(value), username, valuesRedacted: true, }; } +function authSummaryFingerprint(value) { + const payload = JSON.stringify({ + origin: new URL(baseUrl).origin, + loginPath: value.loginUrl ?? null, + method: value.method ?? null, + status: value.status ?? null, + statusText: value.statusText ?? null, + cookiePresent: value.cookiePresent === true, + retryCount: Number.isInteger(value.retryCount) ? value.retryCount : retryCountFromAttempts(value.attempts), + fallbackUsed: value.fallbackUsed === true || value.method === "form-fallback", + }); + return "sha256:" + createHash("sha256").update(payload).digest("hex").slice(0, 16); +} + function cloneSummary(value) { if (!value || typeof value !== "object" || Array.isArray(value)) return value ?? null; return { ...value };