feat: add marker-only sub2api sentinel reporting
This commit is contained in:
@@ -51,12 +51,32 @@ When Codex startup repeatedly reports WebSocket reconnects or HTTPS fallback, pr
|
||||
|
||||
Do not encode current availability assumptions in long-term reference prose. If an account needs a higher concurrency or load factor than the pool default, make that a deliberate YAML override and verify it with `codex-pool validate`; the reference document should describe the rule, not repeat the current numeric value.
|
||||
|
||||
Do not enable Sub2API `pool_mode` for UniDesk-managed Codex accounts. `pool_mode` retries the same selected account path, while UniDesk's desired failover behavior is to mark the failing account temporarily unschedulable and let Sub2API choose another account from the group. `codex-pool validate` reports each managed account's temporary-unschedulable runtime alignment and should be used after `codex-pool sync --confirm`. Generic 502/503/504 bodies such as `Recovered upstream error 502`, `Bad Gateway`, `Gateway Timeout`, Codex-facing `Upstream request failed`, `Unknown error`, context-deadline/canceled wrappers, stable 400 `invalid_encrypted_content` / unsupported-model wrappers, and stable `model_not_found` / "no available channel for model" wrappers must stay in the YAML cooldown policy so an intermittently bad account is cooled down instead of repeatedly adding latency at the next compact or Responses request. The Codex pool default error cooldown is severity-tiered: temporary signals can start at ten minutes, gateway/service/overload/model-routing failures should cool down longer, and credential, permission, quota, account-compatibility, or account-state failures should use the longest cooldown. Exact current values belong in YAML and runtime validation output.
|
||||
Do not enable Sub2API `pool_mode` for UniDesk-managed Codex accounts. `pool_mode` retries the same selected account path, while UniDesk's desired failover behavior is to mark the failing account temporarily unschedulable and let Sub2API choose another account from the group. `codex-pool validate` reports each managed account's temporary-unschedulable runtime alignment and should be used after `codex-pool sync --confirm`. Generic 502/503/504 bodies such as `Recovered upstream error 502`, `Bad Gateway`, `Gateway Timeout`, Codex-facing `Upstream request failed`, `Unknown error`, context-deadline/canceled wrappers, stable 400 `invalid_encrypted_content` / unsupported-model wrappers, and stable `model_not_found` / "no available channel for model" wrappers must stay in the YAML cooldown policy so an intermittently bad account is cooled down instead of repeatedly adding latency at the next compact or Responses request. The Codex pool default error cooldown is severity-tiered: temporary signals should use the shortest cooldown, gateway/service/overload/model-routing failures should cool down longer, and credential, permission, quota, account-compatibility, or account-state failures should use the longest cooldown. Exact current values belong in YAML and runtime validation output.
|
||||
|
||||
Sub2API temporary-unschedulable rules require both an HTTP status match and a response-body keyword match in the upstream failure/error path. Do not treat them as a general successful-response content filter. If an upstream returns a quota warning or maintenance prompt as normal HTTP 200 assistant content, do not add a YAML 200 cooldown rule, patch Sub2API in place, fork behavior in UniDesk, or bypass `codex-pool sync` to make the pool pretend that account cooling exists. Record the upstream capability gap in an issue when it matters operationally; until upstream Sub2API supports that behavior and `codex-pool validate` proves it, UniDesk should not implement or rely on it.
|
||||
Sub2API temporary-unschedulable rules require both an HTTP status match and a response-body keyword match in the upstream failure/error path. Do not treat them as a general successful-response content filter, and do not add a YAML 200 cooldown rule, patch Sub2API in place, fork Sub2API behavior in UniDesk, or bypass `codex-pool sync` to make the native pool pretend that HTTP 200 content cooling exists. HTTP 200 private content, maintenance text, quota prompts, ads, and similar semantic failures are handled by the external account-level sentinel when that sentinel is enabled, not by Sub2API native `temp_unschedulable_rules`.
|
||||
|
||||
If automatic cooling or same-request failover does not happen for an error that the YAML policy declares, treat that as a Sub2API capability or integration defect. The closeout must show the failing account being marked temporarily unschedulable and the next request or same request selecting another schedulable account; a manually disabled, deleted, or pruned account is not valid evidence for this class of fix.
|
||||
|
||||
## Sub2API Account Test Semantics
|
||||
|
||||
Sub2API v0.1.136 has a separate management-plane account connection test. The admin WebUI account modal calls `POST /api/v1/admin/accounts/:id/test` with `model_id` and, for the admin account table modal, no OpenAI `mode`; the backend binds this to `AccountTestService.TestAccountConnection`, which normalizes an empty mode to `default`.
|
||||
|
||||
For OpenAI API-key accounts in default mode, the test loads the account by id, applies `account.GetMappedModel(model_id)`, checks `openai_compat.ShouldUseResponsesAPI(account.Extra)`, and then builds an upstream URL from the account base URL with `/v1/responses`. It sends a direct upstream request through `httpUpstream.DoWithTLS` with `Content-Type: application/json` and `Authorization: Bearer <account-key>`. The request body is Responses API SSE, not a non-streaming JSON request: `model` is the mapped model, `input` is one user message whose text is `hi`, `stream` is `true`, and `instructions` is Sub2API's embedded OpenAI default instructions. For API-key accounts it does not set `store: false`, `max_output_tokens`, Codex CLI `User-Agent`, `OpenAI-Beta`, `Originator`, `Version`, `Session_ID`, or `Conversation_ID`; those Codex-like headers appear in other paths such as compact probing, not in the default account test.
|
||||
|
||||
The management test success criterion is transport and stream completion, not semantic content. A non-200 upstream response becomes an SSE error. A 200 response is considered successful when `processOpenAIStream` sees `response.completed` or `response.done`; `response.output_text.delta` chunks are forwarded to the WebUI as display text, while `response.failed`, `error`, or EOF before completion fails the test. Therefore a WebUI "hi" success proves that this direct account can complete a streaming `/v1/responses` request with Sub2API's default payload shape, but it does not prove that a non-streaming Responses request, marker prompt, `max_output_tokens`, `store: false`, Codex header set, compact path, WebSocket path, or normal pool-scheduled gateway request will behave identically.
|
||||
|
||||
This management-plane test is also outside the normal consumer gateway scheduler. It fetches the account by id instead of listing only schedulable accounts, so `status=active` in the modal and a successful account test can coexist with `schedulable=false` in scheduler state. Because the test performs its own outbound `DoWithTLS` call, regular gateway access logs and usage logs may not contain the upstream account id/path/status evidence expected from ordinary `/v1/responses` traffic. When diagnosing account tests, use the management route semantics above or Sub2API source, not gateway access-log absence or an unrelated pool request as proof.
|
||||
|
||||
An external account-level sentinel that wants parity with this WebUI path should reuse the same request shape as far as the standard OpenAI SDK allows: direct account credentials, Responses API, `stream=true`, no `store: false` for API-key accounts, no upstream `max_output_tokens` field, and success parsing based on the streaming events. A local stream delta collection limit is acceptable as a sentinel safety bound, but it should not change the upstream request body. The sentinel may replace the user text `hi` with a marker prompt, but it should not introduce extra request fields or Codex/compact headers merely for convenience. If a marker-only sentinel intentionally diverges from the management test shape, the divergence must be documented in probe output so a WebUI success and sentinel failure are not misread as operator error.
|
||||
|
||||
## Account Sentinel Marker Contract
|
||||
|
||||
The UniDesk account-level sentinel uses marker-only health semantics. A probe is healthy only when the upstream response satisfies the configured marker match. Every other result is unhealthy and must enter the same exponential freeze state machine, regardless of whether the immediate response is HTTP 200, 400, 403, 429, 500, 502, 503, 504, a streaming error event, malformed output, empty output, timeout, or any other transport/API failure. HTTP status, upstream error code, body hash, body preview, headers, and SDK exception class are diagnostics only; they must not become additional allow/deny criteria that bypass marker mismatch.
|
||||
|
||||
The sentinel must not maintain separate classifiers for "private content", "maintenance", "quota", "ads", or provider-specific body phrases as health gates. The only recovery condition is a later recovery probe that matches the marker. Freeze TTL expiry only schedules the next recovery probe; it does not restore an account by itself. Repeated non-marker results use exponential freeze backoff, and repeated marker-matching results use the configured success cadence backoff. This contract applies equally to OpenAI Responses `gpt-5.5` direct account probes and manual `codex-pool sentinel-probe --account ... --confirm` measurements.
|
||||
|
||||
Operational observation for this sentinel should use the read-only `codex-pool sentinel-report` table or its `--raw` form. It is the canonical low-noise view for per-account probe count, marker result, HTTP/error diagnostics, freeze TTL, success cadence, next probe time, and recent CronJob runs; raw ConfigMap dumps and ad hoc log scraping are fallback diagnostics, not the primary state surface.
|
||||
|
||||
The request path is:
|
||||
|
||||
1. A client sends an OpenAI-compatible request to the configured consumer base URL, normally `https://sub2api.74-48-78-17.nip.io/v1/...`, with the unified API key.
|
||||
|
||||
Reference in New Issue
Block a user