docs: add k3s NetworkPolicy requirement for platform-infra

G14 k3s kube-router generates per-pod REJECT chains when any
NetworkPolicy CRD exists; without an allow-all policy in the
namespace, cross-pod traffic (Sub2API→PostgreSQL, Sub2API→Redis)
is silently blocked. Document the required allow-all NetworkPolicy
and diagnostic symptoms for future reference.

Ref: pikasTech/unidesk#254
This commit is contained in:
Codex
2026-06-10 15:55:05 +00:00
parent 4ee3a67089
commit cdf7909f1c
+34
View File
@@ -95,3 +95,37 @@ When an automatic availability probe is added, it should be YAML-controlled and
4. Optional per-upstream account probes if Sub2API exposes a safe account selection or admin-health mechanism; otherwise document that group-level success does not prove every upstream account is healthy.
Until continuous probing exists, closeout comments must state that validation was on-demand and include the exact CLI/API entrypoints used.
## k3s Network Policy Requirements
G14 k3s runs kube-router as its network policy controller. When any NetworkPolicy CRD exists in a namespace, kube-router replaces its default allow-all behavior with explicit iptables/ipset rules that only permit traffic matching declared policies. If a namespace has NetworkPolicy resources but the generated iptables rules miss or incorrectly evaluate a traffic path, pods in that namespace will experience silent connection timeouts (REJECT with `icmp-port-unreachable`) even though `kubectl get networkpolicy` shows the policy and DNS/service resolution works.
The `platform-infra` namespace **must** have a `NetworkPolicy` named `allow-all` (or equivalent) that explicitly permits all ingress and egress within the namespace. Without it, kube-router's default-deny iptables chains block cross-pod traffic including Sub2API → PostgreSQL and Sub2API → Redis connections, causing Sub2API init containers and background services to hang with `context deadline exceeded` or `no response` errors.
Diagnostic symptoms:
- Sub2API pod stuck `Init:0/2` with `wait-postgres` logging `sub2api-postgres:5432 - no response` perpetually
- `pg_isready` succeeds inside the postgres pod itself but TCP from any other pod times out
- `kubectl exec` from a different pod or `nc -zv` to the postgres ClusterIP/pod-IP returns `Operation timed out`
- `iptables -L KUBE-ROUTER-INPUT -n | grep <namespace>` shows per-pod FW chains; the chain ends with `REJECT ... mark match ! 0x10000/0x10000`
If kube-router iptables rules become stale after a NetworkPolicy create/update cycle (e.g., ipset references old pod IPs or mark-bit logic fails to match), the fastest recovery is: `iptables -I FORWARD 1 -s 10.42.0.0/16 -d 10.42.0.0/16 -j ACCEPT` as a temporary bypass, then recreate the NetworkPolicy or restart kube-router/k3s to force a full iptables sync. After recovery, remove the temporary rule: `iptables -D FORWARD -s 10.42.0.0/16 -d 10.42.0.0/16 -j ACCEPT`.
The manifest for the required `allow-all` policy is:
```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-all
namespace: platform-infra
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- {}
egress:
- {}
```
This policy must be included in the `sub2api plan` / `apply` manifest rendering so that it is created as part of the normal deployment flow, not maintained as a manual one-off.