docs: add k3s NetworkPolicy requirement for platform-infra

G14 k3s kube-router generates per-pod REJECT chains when any NetworkPolicy CRD exists; without an allow-all policy in the namespace, cross-pod traffic (Sub2API→PostgreSQL, Sub2API→Redis) is silently blocked. Document the required allow-all NetworkPolicy and diagnostic symptoms for future reference. Ref: pikasTech/unidesk#254
2026-06-10 15:55:05 +00:00
parent 4ee3a67089
commit cdf7909f1c
1 changed files with 34 additions and 0 deletions
@@ -95,3 +95,37 @@ When an automatic availability probe is added, it should be YAML-controlled and
 4. Optional per-upstream account probes if Sub2API exposes a safe account selection or admin-health mechanism; otherwise document that group-level success does not prove every upstream account is healthy.

 Until continuous probing exists, closeout comments must state that validation was on-demand and include the exact CLI/API entrypoints used.
+
+## k3s Network Policy Requirements
+
+G14 k3s runs kube-router as its network policy controller. When any NetworkPolicy CRD exists in a namespace, kube-router replaces its default allow-all behavior with explicit iptables/ipset rules that only permit traffic matching declared policies. If a namespace has NetworkPolicy resources but the generated iptables rules miss or incorrectly evaluate a traffic path, pods in that namespace will experience silent connection timeouts (REJECT with `icmp-port-unreachable`) even though `kubectl get networkpolicy` shows the policy and DNS/service resolution works.
+
+The `platform-infra` namespace **must** have a `NetworkPolicy` named `allow-all` (or equivalent) that explicitly permits all ingress and egress within the namespace. Without it, kube-router's default-deny iptables chains block cross-pod traffic including Sub2API → PostgreSQL and Sub2API → Redis connections, causing Sub2API init containers and background services to hang with `context deadline exceeded` or `no response` errors.
+
+Diagnostic symptoms:
+- Sub2API pod stuck `Init:0/2` with `wait-postgres` logging `sub2api-postgres:5432 - no response` perpetually
+- `pg_isready` succeeds inside the postgres pod itself but TCP from any other pod times out
+- `kubectl exec` from a different pod or `nc -zv` to the postgres ClusterIP/pod-IP returns `Operation timed out`
+- `iptables -L KUBE-ROUTER-INPUT -n | grep <namespace>` shows per-pod FW chains; the chain ends with `REJECT ... mark match ! 0x10000/0x10000`
+
+If kube-router iptables rules become stale after a NetworkPolicy create/update cycle (e.g., ipset references old pod IPs or mark-bit logic fails to match), the fastest recovery is: `iptables -I FORWARD 1 -s 10.42.0.0/16 -d 10.42.0.0/16 -j ACCEPT` as a temporary bypass, then recreate the NetworkPolicy or restart kube-router/k3s to force a full iptables sync. After recovery, remove the temporary rule: `iptables -D FORWARD -s 10.42.0.0/16 -d 10.42.0.0/16 -j ACCEPT`.
+
+The manifest for the required `allow-all` policy is:
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+  name: allow-all
+  namespace: platform-infra
+spec:
+  podSelector: {}
+  policyTypes:
+  - Ingress
+  - Egress
+  ingress:
+  - {}
+  egress:
+  - {}
+```
+
+This policy must be included in the `sub2api plan` / `apply` manifest rendering so that it is created as part of the normal deployment flow, not maintained as a manual one-off.