docs: capture otel sentinel visibility guidance

This commit is contained in:
Codex
2026-06-30 06:52:18 +00:00
parent 020d52aeb5
commit 23a7d7331c
2 changed files with 4 additions and 0 deletions
+2
View File
@@ -54,6 +54,8 @@ bun scripts/cli.ts web-probe observe analyze <observerId>
4. Separate single-sentinel and multi-sentinel: root registry shows all sentinels; each runner owns independent Pod/PVC/Service/report. A single monitor-web aggregation layer is a separate responsibility.
5. Separate timing alerts and blockers: YAML-configured elapsed/timeout warnings are non-blocking unless the turn fails to complete, breaks Code Agent multi-round continuity, loses samples, or makes auth/submit/report unavailable.
6. Separate check type counts and sample counts: `findingCount`/`findingTypeCount` is a type count, while `severityCounts` and finding `count` are sample counts.
7. Trace-frame reports should prefer latest terminal/completed samples. If a report shows an early running/non-terminal sample, check whether the frame reports a later terminal sample and rerun with that `--sample-seq` before concluding the business turn is still running.
8. Browser memory/responsiveness/CDP red findings may include `rootCauseSignals` such as session list reads, trace event reads, web-performance beacon failures, EventSource failures and requestfailed/http TopN. Use those fields as first-line root-cause evidence for refresh storms before manually grepping JSONL artifacts.
## Architecture Preference
+2
View File
@@ -22,6 +22,8 @@ bun scripts/cli.ts platform-infra observability search --service <service> --lim
- 可见性问题优先修复;状态、耗时、失败原因、trace、命令结果或关键证据不可见时,先补 CLI/日志/状态输出。
- OTel 查询默认低噪声摘要;完整 span/context 显式 `--full`/`--raw`
- 不把 trace 缺口误判成业务成功;缺少 span 或窗口不完整时,先说明观测边界。
- `diagnose-code-agent``observabilityGap` / `Service trace coverage` 是跨服务追穿完整性证据:如果 business trace 只有 `hwlab-cloud-api`,但缺少 `agentrun-manager``agentrun-runner`,应按观测缺口记录并继续用 runId/commandId/sessionId drill-down,不要把缺失 span 当作服务未参与的业务结论。
- `platform-infra observability status` 默认应保持短表;需要完整 Kubernetes/Tempo payload 时才显式使用 `--full``--raw`
## 何时读取 reference