diff --git a/docs/reference/platform-infra.md b/docs/reference/platform-infra.md index 2005ca48..a313abd4 100644 --- a/docs/reference/platform-infra.md +++ b/docs/reference/platform-infra.md @@ -76,11 +76,15 @@ The UniDesk account-level sentinel uses marker-only health semantics. A probe is The sentinel must not maintain separate classifiers for "private content", "maintenance", "quota", "ads", or provider-specific body phrases as health gates. The only recovery condition is a later recovery probe that matches the marker. Freeze TTL expiry only schedules the next recovery probe; it does not restore an account by itself. Repeated non-marker results use a short exponential freeze backoff because failed marker probes produce little or no useful output token usage; repeated marker-matching results use the configured success cadence backoff. This contract applies equally to OpenAI Responses `gpt-5.5` direct account probes and manual `codex-pool sentinel-probe --account ... --confirm` measurements. -When `codex-pool sync --confirm` creates a YAML-managed account or changes direct-probe-relevant account inputs such as the profile mapping, upstream base URL, API key fingerprint, upstream User-Agent, or Responses WebSocket mode, only that account must be default-frozen before it can enter the scheduler. Sync first records a pending sentinel quality gate from the pre-mutation runtime state, then updates the account, then schedules the account probe immediately. This ordering prevents a new or changed account from being written to Sub2API without a matching sentinel quarantine record if sync fails midway. Passing the marker clears the quality gate and restores schedulability; any non-marker result continues the failure freeze backoff. Unchanged accounts must not have their existing success or failure backoff reset by unrelated YAML syncs. +`profiles.entries[].trustUpstream` is the durable account-level trust marker for sentinel success cadence, and the absence of the field means untrusted. Trusted and untrusted accounts use separate YAML cadence maximums after marker-matching probes; the values belong only in `config/platform-infra/sub2api-codex-pool.yaml`. This field must not change Sub2API scheduler priority, capacity, load factor, membership, native temporary-unschedulable rules, or the marker-only health contract. Its purpose is to keep intermittently unreliable 200-success providers under more frequent direct probes without adding provider-specific content classifiers. + +When `codex-pool sync --confirm` creates a YAML-managed account or changes direct-probe-relevant account inputs such as the profile mapping, upstream base URL, API key fingerprint, upstream User-Agent, Responses WebSocket mode, or `trustUpstream`, only that account must be default-frozen before it can enter the scheduler. Sync first records a pending sentinel quality gate from the pre-mutation runtime state, then updates the account, then schedules the account probe immediately. This ordering prevents a new or changed account from being written to Sub2API without a matching sentinel quarantine record if sync fails midway. Passing the marker clears the quality gate and restores schedulability; any non-marker result continues the failure freeze backoff. Unchanged accounts must not have their existing success or failure backoff reset by unrelated YAML syncs. If the YAML failure freeze maximum is lowered, `codex-pool sync --confirm` may migrate only currently active sentinel quarantines whose stored interval or next recovery time exceeds the current maximum. The migration keeps the account frozen, marks the next recovery probe due immediately, and lets the next marker result decide restore versus the new shorter failure backoff. It must not clear quarantine or restore schedulability merely because an older TTL has expired. -Operational observation for this sentinel should use the read-only `codex-pool sentinel-report` table or its `--raw` form. It is the canonical low-noise view for per-account probe count, marker result, HTTP/error diagnostics, freeze TTL, success cadence, next probe time, and recent CronJob runs; raw ConfigMap dumps and ad hoc log scraping are fallback diagnostics, not the primary state surface. +If the YAML success cadence maximum is lowered or an account changes trust class, `codex-pool sync --confirm` may clamp existing successful account state so the next probe is due under the current YAML policy instead of waiting for an older, longer success window to expire. This clamp only affects sentinel state and probe timing; it does not by itself restore a quarantined account or bypass the next marker result. + +Operational observation for this sentinel should use the read-only `codex-pool sentinel-report` table or its `--raw` form. It is the canonical low-noise view for per-account probe count, trust class, marker result, HTTP/error diagnostics, freeze TTL, success cadence, success cadence maximum, next probe time, and recent CronJob runs; raw ConfigMap dumps and ad hoc log scraping are fallback diagnostics, not the primary state surface. The request path is: