From 17e0be8580b41bca71aec229fc4f8c94b9803f97 Mon Sep 17 00:00:00 2001
From: Codex <codex@noreply.local>
Date: Thu, 4 Jun 2026 10:33:23 +0000
Subject: [PATCH] docs: clarify HWLAB Cloud Web validation

---
 docs/reference/g14.md | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/docs/reference/g14.md b/docs/reference/g14.md
index cb301583..d1dab830 100644
--- a/docs/reference/g14.md
+++ b/docs/reference/g14.md
@@ -143,11 +143,17 @@ Use these surfaces together:
 - Public API probes for both `/health/live` and `/v1/live-builds`. `/health/live` proves live service health/revision, but Cloud Web build time, image tag/digest, source metadata, and actual runtime commit/revision should be read from `/v1/live-builds`.
 - A bounded browser/DOM probe against `http://74.48.78.17:19666/` that asserts the deployed page state relevant to the issue.
 
+Cloud Web frontend regressions still use the two-layer validation rule. Deterministic client behavior, such as scroll-follow state machines, Markdown/HTML escaping, shared renderer output, persisted view mapping and DOM class/attribute decisions, should be reproduced first in source-level unit or contract tests; those tests may mock DOM nodes, API responses or renderer input because they are the fast regression guard. The deployed browser or Web-equivalent CLI layer must not mock the user entry, and should prove only the live integration that unit tests cannot prove: the public bundle is deployed, the real page dispatch path creates the expected DOM state, and the user-visible control behaves on the target lane. Do not move every frontend bug into CLI/browser smoke just because it is user-facing.
+
+Cloud Web message Markdown must go through a single shared React renderer component. Do not maintain a hand-written Markdown parser or a `dangerouslySetInnerHTML` message path for normal chat/workbench messages. The shared renderer's fast tests should cover at least GFM table rendering, inline/fenced code, emphasis/strong text and raw HTML escaping. Browser closeout should assert rendered DOM shape, such as `table`/`code`/`strong` counts and absence of injected `script` nodes or executed script flags, instead of comparing the full rendered HTML string.
+
 For Workbench status/build panels, the minimum DOM proof should check the topbar chip, absence of full status cards in the right sidebar, hidden collapsed lists actually absent from layout, bounded scroll ownership on the right content area, and a details dialog that contains environment image metadata, actual live commit/revision, and source/build-time fields when available.
 
 `/v1/live-builds.latest` is global across services and can legitimately point at `hwlab-cloud-api` when API rolled after Web. Inspect the `hwlab-cloud-web` service row before deciding whether a Web build field is missing or stale.
 
-Generic layout smoke can be used only when it is bounded in the current transport. A Playwright smoke that runs through `trans` with no output for the SSH idle timeout, leaves preview/browser processes behind, or never writes an exit/report file is not closure evidence. Run it as an async remote job with explicit report and cleanup, or use a smaller issue-specific DOM probe that emits one JSON result and exits.
+For `#workspace` or other scroll-owner fixes, closeout evidence should include numeric scroll metrics before and after the interaction: `scrollHeight`, `clientHeight`, `scrollTop`, `distanceFromBottom`, computed `overflowY`, and the page's follow/detached state attribute when one exists. Passing evidence for follow-tail behavior must show that new content keeps the view at bottom while already following, manual upward scroll detaches, and scrolling back to the bottom re-attaches. If the issue is specifically about final assistant response persistence or trace rendering, the browser/CLI proof must wait for the final agent/trace result as described above. If the issue is a frontend-only renderer or scroll-container regression and the same component/path renders user and agent messages, a real `#command-input` submission that creates a long user message is sufficient to exercise the deployed renderer/scroll path; do not block closure on an unrelated slow external model turn.
+
+Generic layout smoke can be used only when it is bounded in the current transport. A Playwright smoke that runs through `trans` with no output for the SSH idle timeout, leaves preview/browser processes behind, or never writes an exit/report file is not closure evidence. Run it as an async remote job with explicit report and cleanup, or use a smaller issue-specific DOM probe that emits one JSON result and exits. The stable remote-probe shape is: create a fresh Workbench session through the UI when prior session state may be failed, start the browser script as a target-side job, write a PID/log/result JSON/screenshot on G14, poll those files with short `trans` queries, and cancel any running live turn through the UI before exit when the probe submitted a real prompt. Missing Playwright-managed browser binaries are not a skip; use an approved system browser executable on G14 or install the declared browser dependency, and record the choice. When staging a Node probe outside the repo workspace, make package resolution explicit by running from the workspace or importing packages through the workspace's `node_modules`; do not treat `MODULE_NOT_FOUND` from a `/tmp` script as an application failure.
 
 ### v0.2 Cloud Web Button/JS Sync Rule (HWLAB #748)