Kubernetes Networking: Services, Ingress, Network Policies, and Service Mesh

Kubernetes Networking: Services, Ingress, Network Policies, and Service Mesh

Hero image

Introduction

Kubernetes networking is the layer that most engineers understand just enough to debug obvious failures — and not enough to prevent subtle ones. Pod-to-pod communication works by default, but "works by default" means all pods can reach all other pods with no access controls. A compromised pod in your frontend namespace can reach your database service. A misconfigured Ingress routes production traffic to a test deployment. A missing network policy allows a compromised dependency to exfiltrate data.

This post covers Kubernetes networking at production depth: how Services work at the IP table level, why ClusterIP vs NodePort vs LoadBalancer matters beyond YAML syntax, Ingress controllers and their performance characteristics, Network Policies as the foundation of zero-trust networking in Kubernetes, the CNI layer (what Cilium actually does), service mesh trade-offs, and the automation stack (ExternalDNS, cert-manager) that makes production cluster management manageable. Each section includes the debugging approach for when things go wrong — because in production, they always do eventually.

The Kubernetes Network Model

Kubernetes mandates three properties of its network model:
1. Every pod gets a unique IP address
2. Pods on the same node communicate without NAT
3. Pods on different nodes communicate without NAT

How this is implemented depends on the CNI (Container Network Interface) plugin: Flannel (simple VXLAN overlay), Calico (BGP + iptables), Cilium (eBPF), or Weave. The API is uniform; the implementation varies significantly in performance and features.

Pod IPs are ephemeral — they change when pods restart. Services provide stable virtual IPs (ClusterIPs) that route to healthy pod endpoints. The kube-proxy component (or its replacement) implements this routing.

Architecture diagram

Services: ClusterIP, NodePort, and LoadBalancer

# ClusterIP: stable virtual IP, only reachable within cluster
apiVersion: v1
kind: Service
metadata:
  name: payment-api
  namespace: payments
spec:
  type: ClusterIP
  selector:
    app: payment-api        # routes to pods with this label
  ports:
  - port: 80               # service port (what callers use)
    targetPort: 8080       # pod port (what your app listens on)
    protocol: TCP
---
# NodePort: exposes service on each node's IP at a static port
# Use for local development, not production (bypasses Ingress/LoadBalancer)
apiVersion: v1
kind: Service
spec:
  type: NodePort
  ports:
  - port: 80
    targetPort: 8080
    nodePort: 30080        # 30000-32767 range; if omitted, auto-assigned
---
# LoadBalancer: provisions a cloud load balancer (NLB/ELB)
# Each LoadBalancer service costs money (one cloud LB per service)
# Use Ingress to multiplex many services behind one LoadBalancer
apiVersion: v1
kind: Service
spec:
  type: LoadBalancer
  ports:
  - port: 443
    targetPort: 8443

How ClusterIP routing works: kube-proxy (or eBPF in Cilium) watches the Endpoints object (list of healthy pod IPs) for each Service. It programs iptables rules that DNAT the ClusterIP:port to one of the healthy pod IPs using random or round-robin selection. When a pod fails its liveness probe and is removed from Endpoints, it's removed from the iptables rules within seconds. The Service IP is stable; the underlying pod IPs rotate.

The iptables implementation has an O(n) lookup time — at 10,000 services, iptables rules become a performance bottleneck. Cilium's eBPF implementation uses hash tables for O(1) lookup and is the reason large clusters migrate away from kube-proxy.

Ingress: HTTP Routing and TLS Termination

An Ingress resource defines HTTP routing rules: host-based and path-based routing to Services. One Ingress controller (typically one LoadBalancer service) multiplexes traffic to hundreds of backend services.

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  namespace: production
  annotations:
    # nginx-specific annotations
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/rate-limit-window: "1m"
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/use-regex: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.example.com
    secretName: api-tls-cert  # TLS cert stored as Secret
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /api/v1/payments(/|$)(.*)
        pathType: Prefix
        backend:
          service:
            name: payment-api
            port:
              number: 80
      - path: /api/v1/orders(/|$)(.*)
        pathType: Prefix
        backend:
          service:
            name: order-api
            port:
              number: 80
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend
            port:
              number: 80

Ingress controller choices in 2026:
- nginx-ingress: most widely deployed, extensive annotation-based configuration, well-documented
- Traefik: native Let's Encrypt integration, middleware chain, better UI
- Kong: API gateway features (auth plugins, rate limiting, request transformation)
- Gateway API (standard): the successor to Ingress, more expressive, better multi-team support

The Gateway API (now stable as of Kubernetes 1.28+) separates cluster-level infrastructure (GatewayClass, Gateway) from application-level routing (HTTPRoute), enabling better multi-tenant control. The Ingress API remains supported but new features are being added to Gateway API.

# Gateway API: replacing Ingress
apiVersion: gateway.networking.k8s.io/v1
kind: HTTPRoute
metadata:
  name: payment-route
spec:
  parentRefs:
  - name: prod-gateway           # references the Gateway (cluster-level)
  hostnames:
  - "api.example.com"
  rules:
  - matches:
    - path:
        type: PathPrefix
        value: /api/v1/payments
    backendRefs:
    - name: payment-api
      port: 80
      weight: 90               # 90% traffic to stable
    - name: payment-api-canary
      port: 80
      weight: 10               # 10% traffic to canary — traffic splitting

The Gateway API's traffic splitting capability enables canary deployments at the Ingress layer — weight-based routing without a service mesh.

Network Policies: Zero-Trust Kubernetes Networking

Without Network Policies, all pods in a cluster can reach all other pods. A compromised pod in your CDN service can connect to your database. Network Policies define which pods can talk to which pods.

# Default deny: no ingress or egress from the payments namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: payments
spec:
  podSelector: {}              # applies to all pods in the namespace
  policyTypes:
  - Ingress
  - Egress
---
# Allow: payments pods can receive traffic from API gateway only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-gateway-ingress
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payment-service
  policyTypes:
  - Ingress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: api-gateway     # only from api-gateway namespace
    - podSelector:
        matchLabels:
          app: api-gateway      # and only from api-gateway pods
    ports:
    - port: 8080
      protocol: TCP
---
# Allow: payments pods can reach the database and DNS only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: payment-service-egress
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payment-service
  policyTypes:
  - Egress
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: databases
    ports:
    - port: 5432               # PostgreSQL
  - to:
    - namespaceSelector:
        matchLabels:
          kubernetes.io/metadata.name: kube-system
    ports:
    - port: 53                 # DNS — don't forget this or pod DNS breaks
      protocol: UDP
    - port: 53
      protocol: TCP

The default-deny-all then explicit-allow pattern is the correct approach for production. Start locked down; add permissions as needed. The common mistake is writing podSelector: {} (allow-all) for both ingress and egress because it "just works" — then you have no network segmentation at all.

Network Policy enforcement requires a CNI that supports it. Flannel does not. Calico, Cilium, and Weave do. Verify your CNI supports Network Policies before applying them — on Flannel clusters, the policies are accepted by the API server but silently not enforced.

Comparison visual

EndpointSlices and Large-Scale Service Routing

The Endpoints object has a 1MB size limit — a single Service with more than ~5,000 pod IPs exceeds this limit and breaks. EndpointSlices (stable since Kubernetes 1.21) shard the endpoint data: each EndpointSlice holds up to 100 endpoints, and kube-proxy watches all EndpointSlices for a Service.

EndpointSlices also reduce kube-proxy CPU load: when a single pod is added to a Service, only the affected EndpointSlice is updated — not the entire endpoint list. At 1,000-pod Services, this reduces the update payload by ~99%.

Topology-aware routing (also called topology hints) assigns EndpointSlices to the same zone as the requesting pod. This reduces cross-zone data transfer costs (in cloud providers, cross-zone traffic is billed at ~$0.01/GB) and latency for services with geographically distributed nodes:

# Enable topology-aware routing on a Service
apiVersion: v1
kind: Service
metadata:
  name: recommendation-api
  annotations:
    service.kubernetes.io/topology-mode: "auto"  # prefer same-zone endpoints
spec:
  selector:
    app: recommendation-api
  ports:
  - port: 80
    targetPort: 8080

With topology-mode: auto, kube-proxy (or Cilium) preferentially routes requests to endpoints in the same availability zone. If no in-zone endpoints are available, it falls back to any endpoint. The annotation works best when endpoints are roughly evenly distributed across zones — severely imbalanced distributions disable the optimization automatically.

Cilium: eBPF-Based Networking

Cilium replaces kube-proxy and extends Network Policies with capabilities not possible in iptables:

# Cilium NetworkPolicy: L7-aware policies (iptables only does L3/L4)
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: payment-l7-policy
spec:
  endpointSelector:
    matchLabels:
      app: payment-service
  ingress:
  - fromEndpoints:
    - matchLabels:
        app: api-gateway
    toPorts:
    - ports:
      - port: "8080"
        protocol: TCP
      rules:
        http:
        - method: "POST"
          path: "/api/v1/charge"   # only allow this specific endpoint
        - method: "GET"
          path: "/api/v1/status"   # and this one
        # All other paths blocked at the network layer

L7-aware policies block specific HTTP paths or gRPC methods at the network layer — not the application layer. A compromised internal service cannot call DELETE /api/v1/all_orders even if it can reach the payment service's pod IP.

Cilium's eBPF implementation also provides:
- Network performance: O(1) service lookup vs O(n) iptables, measurable improvement at >1000 services
- Hubble observability: real-time flow visibility, service dependency maps, network policy verification
- Transparent encryption: WireGuard-based pod-to-pod encryption without application changes
- Bandwidth management: per-pod egress bandwidth limits

The operational trade-off: Cilium requires a more recent Linux kernel (5.4+ for most features, 5.10+ for advanced features) and a more complex installation than vanilla kube-proxy. In 2026, most managed Kubernetes offerings (EKS, GKE, AKS) support or default to Cilium.

ExternalDNS: Automating DNS Record Management

Manually updating DNS records when LoadBalancer IPs change is error-prone. ExternalDNS watches Services and Ingresses and automatically updates Route 53, Cloud DNS, or Cloudflare when load balancer IPs are assigned:

# ExternalDNS deployment
apiVersion: apps/v1
kind: Deployment
metadata:
  name: external-dns
spec:
  template:
    spec:
      containers:
      - name: external-dns
        image: registry.k8s.io/external-dns/external-dns:v0.14.0
        args:
        - --source=service
        - --source=ingress
        - --domain-filter=example.com      # only manage example.com records
        - --provider=aws                   # Route 53
        - --policy=upsert-only             # never delete records (safer)
        - --aws-zone-type=public
        - --log-level=info

When you create an Ingress with host: api.example.com, ExternalDNS automatically creates a Route 53 A record pointing to the Ingress controller's LoadBalancer IP. When the LoadBalancer IP changes (during cluster migration, for example), ExternalDNS updates the record automatically.

Combined with cert-manager (automatic TLS certificate provisioning from Let's Encrypt), you get fully automated HTTPS endpoint management:

# cert-manager ClusterIssuer for Let's Encrypt
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: ops@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-key
    solvers:
    - http01:
        ingress:
          class: nginx

# Ingress with cert-manager annotation: auto-provisions certificate
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt-prod  # triggers cert provisioning
spec:
  tls:
  - hosts:
    - api.example.com
    secretName: api-example-com-tls   # cert-manager stores cert here
  rules:
  - host: api.example.com
    ...

The combination of ExternalDNS + cert-manager + nginx-ingress is the standard production Kubernetes HTTP stack: deploy an application, annotate the Ingress, and within 2 minutes it's live with a valid HTTPS certificate and DNS record — fully automated.

Service Mesh: When You Need It

A service mesh (Istio, Linkerd, Cilium Service Mesh) adds a sidecar proxy (or eBPF) to each pod that handles:
- Mutual TLS (mTLS) between all services
- Traffic management (canary deployments, circuit breakers, retries)
- Observability (distributed traces, metrics, per-service dashboards)

The value is real. The cost is also real: Istio adds 7-15MB memory per sidecar, 1-3ms latency per hop, and significant operational complexity (CRDs, control plane management, certificate rotation).

Use a service mesh when:
- You require mTLS for compliance (SOC 2, PCI DSS) and can't add TLS to each service individually
- You need circuit breaking, retries, and timeout policies applied consistently across all services
- You want distributed tracing without application-level instrumentation (OpenTelemetry auto-instrumentation is usually sufficient)
- You have 20+ services that need consistent traffic management

Don't use a service mesh when:
- You have fewer than 10 services (overhead exceeds benefit)
- Your team doesn't have operational experience with the service mesh control plane
- You can achieve the same goals with Cilium Network Policies + OpenTelemetry + Gateway API traffic splitting

Linkerd has a much simpler operational model than Istio (no complex VirtualService/DestinationRule CRDs, lower resource overhead, automatic mTLS with near-zero config). If mTLS compliance is the primary driver, Linkerd is the better choice. If advanced traffic management is needed, Istio's capabilities justify its complexity.

Note that Cilium Service Mesh (available when Cilium is your CNI) provides mTLS and basic traffic management via eBPF without sidecars — zero sidecar overhead. For clusters already running Cilium, this is worth evaluating before adopting Istio or Linkerd, since the infrastructure is already in place.

Debugging Kubernetes Network Issues

A systematic approach to the most common networking failures:

Pod can't reach another pod by Service name:

# Step 1: Verify DNS resolves
kubectl exec -it debug-pod -- nslookup payment-api.payments.svc.cluster.local
# If fails: CoreDNS problem or pod DNS config issue

# Step 2: Verify Service has endpoints
kubectl get endpoints payment-api -n payments
# If ADDRESS column is empty: selector doesn't match pod labels

# Step 3: Test direct pod-to-pod connectivity (bypass Service)
kubectl get pods -n payments -o wide          # get pod IP
kubectl exec -it debug-pod -- curl http://10.244.1.5:8080/health
# If succeeds but Service fails: iptables/kube-proxy issue

# Step 4: Check Network Policy isn't blocking
kubectl exec -n payments debug-pod -- curl http://payment-api:80
# If denied: check NetworkPolicy objects in both namespaces

# Cilium policy troubleshooting with Hubble
hubble observe --namespace payments --follow
# Shows dropped packets with reason: network policy rule, etc.

Ingress not routing correctly:

# Check Ingress controller logs
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller | grep payment-api

# Verify Ingress backend resolves
kubectl describe ingress api-ingress -n production
# Look for: "Default backend: default-http-backend:80 (<error>)"

# Check that the backend Service and its pods are healthy
kubectl get svc payment-api -n payments
kubectl get endpoints payment-api -n payments  # must have addresses

# Test with curl from inside the cluster (bypasses Ingress, tests Service)
kubectl run test --image=curlimages/curl --rm -it -- \
    curl http://payment-api.payments.svc.cluster.local/health

Connection timeout to external services:

# Check egress Network Policy allows the external IP/port
kubectl get networkpolicy -n payments

# Test DNS resolution of external service
kubectl exec -it pod -- nslookup api.stripe.com

# Check if traffic is being NATted correctly
kubectl exec -it pod -- curl -v https://api.stripe.com/v1/balance \
    -H "Authorization: Bearer sk_test_..."

Intermittent connection failures under load:
This is often a conntrack table overflow (the kernel's connection tracking table is full). Symptoms: nf_conntrack: table full, dropping packet in node kernel logs. Mitigation:

# On each node
sysctl net.netfilter.nf_conntrack_max
# Increase if close to limit under load
# Also check: sysctl net.netfilter.nf_conntrack_count

With Cilium (eBPF), conntrack is handled in eBPF maps with much higher limits than the kernel conntrack table — this is one of the scaling advantages.

DNS in Kubernetes: CoreDNS and Service Discovery

CoreDNS is the default DNS server in Kubernetes. Services are resolvable via DNS within the cluster:

# DNS name format: <service>.<namespace>.svc.cluster.local
# Short forms also work within the same namespace:
# - <service>                            (same namespace)
# - <service>.<namespace>                (any namespace)
# - <service>.<namespace>.svc            (any namespace)
# - <service>.<namespace>.svc.cluster.local (full FQDN)

# From within payments namespace:
# payment-api              → resolves to ClusterIP
# payment-api.payments     → same
# postgres.databases       → cross-namespace

# Headless services (ClusterIP: None): DNS returns pod IPs
# Used for StatefulSets where you need to reach specific pods

CoreDNS performance issues at scale: with many pods making many DNS lookups, CoreDNS can become a bottleneck. Mitigations:
- ndots:5 default causes each DNS lookup to try 5 search domain suffixes before returning. Set ndots:2 for services that use FQDNs
- NodeLocal DNSCache: DNS cache on each node, reduces CoreDNS load by ~70%
- Increase CoreDNS replicas: HPA on CoreDNS based on requests/second

Production Kubernetes Networking Stack: Recommendations by Scale

Small clusters (1-20 nodes, <100 services):
- CNI: Flannel or Calico (simple, well-understood)
- Service routing: kube-proxy (iptables overhead not visible at this scale)
- Ingress: nginx-ingress (most documentation, easiest debugging)
- Network Policies: Calico provides enforcement even with Flannel overlay
- Service mesh: skip (overhead exceeds benefit)
- DNS automation: ExternalDNS + cert-manager

Medium clusters (20-200 nodes, 100-1000 services):
- CNI: Cilium (eBPF performance advantage becomes measurable, Hubble for visibility)
- Service routing: Cilium replaces kube-proxy
- Ingress: nginx-ingress or Gateway API
- Network Policies: Cilium NetworkPolicy (L7-aware)
- Service mesh: Linkerd if mTLS compliance required; skip otherwise
- Topology-aware routing for cross-zone cost reduction

Large clusters (200+ nodes, 1000+ services):
- CNI: Cilium (iptables simply doesn't scale here)
- EndpointSlices: enabled and monitored
- Ingress: Gateway API (mature at this cluster size)
- Service mesh: likely required (traffic management at this scale benefits from centralized control)
- NodeLocal DNSCache: required (CoreDNS becomes bottleneck without it)

Conclusion

Kubernetes networking has multiple layers, each with distinct concerns. Services provide stable virtual IPs for ephemeral pods. Ingress (and increasingly, Gateway API) multiplexes HTTP traffic with host and path routing. Network Policies provide the access controls that make multi-tenant clusters safe. Cilium's eBPF implementation delivers L7-aware policies and O(1) service lookup at scale. Service meshes add mTLS and advanced traffic management — valuable for large deployments, over-engineered for small ones.

The security posture that production clusters should target: default-deny Network Policies per namespace, explicit-allow ingress and egress for every service, CoreDNS always permitted. Combined with Cilium for enforcement and Hubble for visibility, you have network segmentation comparable to a traditional firewall — but dynamic and programmable through Kubernetes manifests.

The automation stack of ExternalDNS plus cert-manager plus an Ingress controller eliminates manual DNS and certificate management — the operational overhead that previously made Kubernetes networking painful to manage. In 2026, a new service can be deployed, DNS-assigned, and HTTPS-terminated in minutes with no manual intervention. When things go wrong, the debugging workflow follows the layers: DNS resolution, Service endpoints, Network Policy rules, Ingress configuration — in that order. Cilium's Hubble CLI makes network policy debugging particularly tractable, showing dropped packets with the exact policy rule that blocked them.

Sources


Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.

Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter

Comments

Popular posts from this blog

29 Million Secrets Leaked: The Hardcoded Credentials Crisis

What is an LLM? A Beginner's Guide to Large Language Models

What Is Voice AI? TTS, STT, and Voice Agents Explained