docs: add k3s NetworkPolicy requirement for platform-infra
G14 k3s kube-router generates per-pod REJECT chains when any NetworkPolicy CRD exists; without an allow-all policy in the namespace, cross-pod traffic (Sub2API→PostgreSQL, Sub2API→Redis) is silently blocked. Document the required allow-all NetworkPolicy and diagnostic symptoms for future reference. Ref: pikasTech/unidesk#254
This commit is contained in:
@@ -95,3 +95,37 @@ When an automatic availability probe is added, it should be YAML-controlled and
|
||||
4. Optional per-upstream account probes if Sub2API exposes a safe account selection or admin-health mechanism; otherwise document that group-level success does not prove every upstream account is healthy.
|
||||
|
||||
Until continuous probing exists, closeout comments must state that validation was on-demand and include the exact CLI/API entrypoints used.
|
||||
|
||||
## k3s Network Policy Requirements
|
||||
|
||||
G14 k3s runs kube-router as its network policy controller. When any NetworkPolicy CRD exists in a namespace, kube-router replaces its default allow-all behavior with explicit iptables/ipset rules that only permit traffic matching declared policies. If a namespace has NetworkPolicy resources but the generated iptables rules miss or incorrectly evaluate a traffic path, pods in that namespace will experience silent connection timeouts (REJECT with `icmp-port-unreachable`) even though `kubectl get networkpolicy` shows the policy and DNS/service resolution works.
|
||||
|
||||
The `platform-infra` namespace **must** have a `NetworkPolicy` named `allow-all` (or equivalent) that explicitly permits all ingress and egress within the namespace. Without it, kube-router's default-deny iptables chains block cross-pod traffic including Sub2API → PostgreSQL and Sub2API → Redis connections, causing Sub2API init containers and background services to hang with `context deadline exceeded` or `no response` errors.
|
||||
|
||||
Diagnostic symptoms:
|
||||
- Sub2API pod stuck `Init:0/2` with `wait-postgres` logging `sub2api-postgres:5432 - no response` perpetually
|
||||
- `pg_isready` succeeds inside the postgres pod itself but TCP from any other pod times out
|
||||
- `kubectl exec` from a different pod or `nc -zv` to the postgres ClusterIP/pod-IP returns `Operation timed out`
|
||||
- `iptables -L KUBE-ROUTER-INPUT -n | grep <namespace>` shows per-pod FW chains; the chain ends with `REJECT ... mark match ! 0x10000/0x10000`
|
||||
|
||||
If kube-router iptables rules become stale after a NetworkPolicy create/update cycle (e.g., ipset references old pod IPs or mark-bit logic fails to match), the fastest recovery is: `iptables -I FORWARD 1 -s 10.42.0.0/16 -d 10.42.0.0/16 -j ACCEPT` as a temporary bypass, then recreate the NetworkPolicy or restart kube-router/k3s to force a full iptables sync. After recovery, remove the temporary rule: `iptables -D FORWARD -s 10.42.0.0/16 -d 10.42.0.0/16 -j ACCEPT`.
|
||||
|
||||
The manifest for the required `allow-all` policy is:
|
||||
```yaml
|
||||
apiVersion: networking.k8s.io/v1
|
||||
kind: NetworkPolicy
|
||||
metadata:
|
||||
name: allow-all
|
||||
namespace: platform-infra
|
||||
spec:
|
||||
podSelector: {}
|
||||
policyTypes:
|
||||
- Ingress
|
||||
- Egress
|
||||
ingress:
|
||||
- {}
|
||||
egress:
|
||||
- {}
|
||||
```
|
||||
|
||||
This policy must be included in the `sub2api plan` / `apply` manifest rendering so that it is created as part of the normal deployment flow, not maintained as a manual one-off.
|
||||
|
||||
Reference in New Issue
Block a user