From c8143143a07e59af0d393d26322b151d7227aa34 Mon Sep 17 00:00:00 2001 From: Codex Date: Thu, 2 Jul 2026 06:34:25 +0000 Subject: [PATCH] docs: record web sentinel request-rate evidence --- .agents/skills/unidesk-monitor/SKILL.md | 1 + docs/reference/observability.md | 2 ++ 2 files changed, 3 insertions(+) diff --git a/.agents/skills/unidesk-monitor/SKILL.md b/.agents/skills/unidesk-monitor/SKILL.md index 05516af7..085943b6 100644 --- a/.agents/skills/unidesk-monitor/SKILL.md +++ b/.agents/skills/unidesk-monitor/SKILL.md @@ -65,6 +65,7 @@ bun scripts/cli.ts web-probe observe analyze 11. If a run appears to have only WBC-003, compare public `/api/report?view=findings&run=` with CLI `web-probe sentinel report --run --view findings --raw`. `artifactSummary.reason=analysis-report-json-missing-or-invalid` means the service index cannot read that old artifact, not that analyzer findings are absent; reindex/backfill the existing run instead of starting a new observe run. 12. Any new analyzer finding id emitted by quick verify must be registered in the selected check catalog before rollout. A missing catalog entry can make `/api/health` return 503 and leave the new runner pod unhealthy even when the image is otherwise correct. 13. If a dashboard screenshot artifact is small or visually shows `ERR_NETWORK_CHANGED`/browser error chrome while CLI status is otherwise pass, discard it as evidence and rerun after checking the public URL/API status. Treat this as a web-probe evidence-quality issue if repeated; do not close visibility issues from such a screenshot alone. +14. Request-rate curve acceptance uses `/api/runs/{id}.requestRate` plus dashboard screenshot/DOM evidence that the request chart is above the memory chart with aligned time axis. Until `dashboard verify` exposes request-rate-specific fields, do not treat legacy `API_PAGES` / `API_SAMPLES` columns as request curve counts; see `docs/reference/observability.md`. ## Architecture Preference diff --git a/docs/reference/observability.md b/docs/reference/observability.md index ba2a38ea..e41dcd44 100644 --- a/docs/reference/observability.md +++ b/docs/reference/observability.md @@ -35,6 +35,8 @@ Web 哨兵 dashboard/API 展示问题的第一事实源是 sentinel runner 的 ` Web 哨兵 findings 可见性要同时核对 runner API 和已有 observe artifact。若某个 run 的公开 `/api/report?view=findings&run=` 只显示 WBC-003,但 `web-probe sentinel report --run --view findings --raw` 能从 `analysis/report.json` 读出 red/amber analyzer findings,根因是索引或 artifact 可见性遮盖,不是业务没有产生 warning/error。此时应回填或重建这条既有 run 的 report index,并保留原有 report views;不要通过启动新的哨兵 run 来解释旧记录。 +Web 哨兵请求频率曲线的验收事实源是 runner `/api/runs/{id}.requestRate`、已有 observe artifact 中的 request-rate summary,以及 `web-probe sentinel dashboard screenshot` 的远程浏览器证据。阈值、采样间隔、bucket 大小和红黄线只从 YAML/source-of-truth 读取,长期文档只记录字段族与验证入口。验收时应核对 `bucketSeconds`、总请求曲线、页面曲线、API path 曲线、峰值每分钟计数、数据来源和 chart/DOM 是否显示在内存曲线上方并共享时间轴。若 quick-verify 的业务链路失败,但同一 run 的 `requestRate` API 和截图已经有曲线数据,应把请求频率能力验收与业务阻塞分开记录;反之,`requestRate.source=unavailable` 或曲线为空时要继续检查 analyzer compact 输出、artifact summary、索引回填和 report fallback。除非 `dashboard verify` 已显式输出 request-rate 专用字段,不得把旧的 `API_PAGES` / `API_SAMPLES` 列当作请求频率曲线验收结果。 + ## Workbench Request Storm And Freeze Workbench 请求风暴和浏览器无响应的根因调查必须同时使用 OTel、web-probe artifact 和前端 runtime 诊断,不能只看 provider 是否成功或单个 REST route 是否返回 200。最小证据应包含同一用户动作的 `traceId/sessionId/turnId` 或脱敏 scoped key、request family 计数、SSE transport state、recovery action、refresh queue/single-flight 状态、browser memory/freeze sample、observer/run/report SHA,以及用户页面是否仍可操作。缺少其中某个观测面时,先补观测或记录 instrumentation gap,再给出根因结论。