Container Security in 2026: Multi-Stage Builds, Distroless Images, and Supply Chain Security

Container Security in 2026: Multi-Stage Builds, Distroless Images, and Supply Chain Security

Hero image

Introduction

Container security is not a checkbox. It is a layered discipline that spans your build pipeline, base image choices, runtime configuration, secrets handling, and software supply chain. Most teams get some of this right some of the time — but the gaps between layers are where breaches happen.

The threat landscape in 2026 looks different than it did in 2020. Supply chain attacks are now the dominant vector for container compromises. The SolarWinds pattern — compromise a build tool or base image rather than the target directly — has been replicated across dozens of incidents. Dependency confusion attacks, malicious packages injected into public registries, and tampered base images are all confirmed, documented attack paths. Meanwhile, misconfigured runtime permissions remain the most common root cause of container escapes in post-incident reports.

This post covers the full container security stack for production engineering teams: multi-stage builds that eliminate build-time bloat and attack surface, distroless and minimal base images that have near-zero CVE counts, non-root user enforcement at both the Docker and Kubernetes layers, CI-integrated image scanning with SBOM generation, runtime security profiles that limit syscall exposure, secrets management patterns that keep credentials out of image layers and environment variables, and supply chain security tooling (cosign, syft, Sigstore) that lets you cryptographically verify what you're running. Each section includes concrete, runnable code you can adapt directly.

The goal is a hardened container that is small, scannable, signed, secrets-free, and running with the minimum privilege it needs to do its job.


1. Multi-Stage Builds: Ship the Artifact, Not the Toolchain

The most impactful single change most teams can make to container security is adopting multi-stage builds. The principle is simple: you need a fully equipped build environment to compile and package your software, but you do not need that environment at runtime. Every tool you ship — compilers, build systems, package managers, debugging utilities — is attack surface that can be exploited after a container is compromised.

Multi-stage builds let you define a builder stage with everything needed to compile, then copy only the final artifact into a minimal runtime image.

Go application: 250MB → 8MB

# syntax=docker/dockerfile:1.7

# Stage 1: Builder
FROM golang:1.22-alpine AS builder

WORKDIR /src

# Copy dependency manifests first (cache layer)
COPY go.mod go.sum ./
RUN go mod download

# Copy source and build a statically linked binary
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
    go build \
    -ldflags="-w -s -extldflags=-static" \
    -trimpath \
    -o /out/server \
    ./cmd/server

# Stage 2: Distroless runtime (no shell, no package manager)
FROM gcr.io/distroless/static-debian12:nonroot

# Copy only the compiled binary
COPY --from=builder /out/server /server

EXPOSE 8080
ENTRYPOINT ["/server"]

The -ldflags="-w -s" flags strip debug info and symbol tables (smaller binary, nothing for attackers to symbolize). -trimpath removes local build paths from stack traces (privacy). CGO_ENABLED=0 ensures no C runtime dependency — the binary runs on any Linux kernel without libc.

Result: the builder image is ~350MB with the Go toolchain. The final runtime image is ~8MB (distroless/static base is ~2MB, binary adds the rest). The runtime image contains no shell, no package manager, no compiler — only the binary and the minimal system libraries it needs.

Python application with dependency isolation

# syntax=docker/dockerfile:1.7

# Stage 1: Dependency builder
FROM python:3.12-slim AS builder

WORKDIR /app

# Install build tools only in builder
RUN apt-get update && apt-get install -y --no-install-recommends \
    build-essential \
    libffi-dev \
    && rm -rf /var/lib/apt/lists/*

# Install Python dependencies into a prefix we can copy
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt

# Stage 2: Runtime
FROM python:3.12-slim AS runtime

# Create non-root user
RUN useradd --system --no-create-home --shell /sbin/nologin appuser

WORKDIR /app

# Copy installed packages from builder
COPY --from=builder /install /usr/local

# Copy application source (no build tools present)
COPY --chown=appuser:appuser src/ ./src/

USER appuser

EXPOSE 8000
CMD ["python", "-m", "uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]

The key discipline here: build-essential and libffi-dev appear only in the builder stage. They are needed to compile native extensions like cryptography or uvloop. The runtime stage gets a fresh python:3.12-slim and only receives the pre-built packages via COPY --from=builder. No compiler, no build headers — an attacker who achieves code execution cannot install new compiled tools.

Layer caching discipline: always copy go.mod/requirements.txt before source code. Docker caches layers by content hash. If you copy source first, a single source line change invalidates the entire dependency installation cache. Structuring for cache locality can cut CI build times by 60-80% on large projects.

flowchart TD
    A[Source Code] --> B[Builder Stage\nFull toolchain + deps]
    B --> C[Compile / Package\ngo build / pip install]
    C --> D{Copy artifact only}
    D --> E[Runtime Stage\nMinimal base image]
    E --> F[Final Image\n~8MB, no compiler]

    style B fill:#ff6b6b,color:#fff
    style E fill:#51cf66,color:#fff
    style F fill:#339af0,color:#fff

Architecture diagram

2. Distroless and Minimal Base Images: Eliminate the Attack Surface

A container image is a filesystem. Every binary, library, and configuration file in that filesystem is a potential exploit path. The traditional approach — start with ubuntu:22.04 because it is familiar — ships a complete operating system with hundreds of packages, most of which your application never touches. Each of those packages can carry CVEs.

Google's distroless images strip this down to the absolute minimum: only the language runtime and direct system dependencies your application needs. No shell (/bin/sh, /bin/bash), no package manager (apt, apk), no coreutils (ls, chmod, curl). The attack surface collapses.

Image comparison by CVE count (2026 data)

Base Image Size Typical CVE Count Has Shell Has Package Manager
ubuntu:22.04 ~70MB 150-200 CVEs Yes Yes (apt)
debian:bookworm-slim ~75MB 100-150 CVEs Yes Yes (apt)
alpine:3.19 ~7MB 5-20 CVEs Yes (ash) Yes (apk)
gcr.io/distroless/base-debian12 ~20MB 0-5 CVEs No No
gcr.io/distroless/static-debian12 ~2MB 0 CVEs No No
scratch 0MB 0 CVEs No No

When to use each:

  • scratch: statically compiled binaries with zero external dependencies (Go CGO_ENABLED=0, Rust with musl target). Nothing else will work — no DNS resolver, no TLS certs. Must bundle /etc/ssl/certs/ca-certificates.crt and /etc/passwd if your app needs them.
  • distroless/static: statically compiled binaries that need TLS certs and basic system files. Google includes these. Best for Go, Rust.
  • distroless/base: dynamically linked binaries that need glibc. Includes OpenSSL, glibc, libssl. Best for applications with C extensions.
  • distroless/python3, distroless/nodejs: pre-built distroless variants for interpreted runtimes. Google maintains these.
  • alpine: when you need a shell for debugging or entrypoint scripts. Vastly smaller than debian variants. Acceptable for development, avoid for production where possible.

The no-shell constraint eliminates RCE pivot

If an attacker exploits a vulnerability in your application and achieves command execution, their next move is always to run additional commands: download a reverse shell, enumerate the filesystem, escalate privileges. No shell means no shell commands. They are limited to what your application binary can do. Combined with a read-only root filesystem (covered in section 5), the attacker's ability to establish persistence collapses.

# Scratch example: Go binary with bundled TLS certs
FROM golang:1.22-alpine AS builder
RUN apk add --no-cache ca-certificates
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -ldflags="-w -s" -o /out/server ./cmd/server

FROM scratch
# Copy TLS certs from builder
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
# Copy passwd for non-root UID (see section 3)
COPY --from=builder /etc/passwd /etc/passwd
COPY --from=builder /out/server /server
USER nobody
ENTRYPOINT ["/server"]

Alpine trade-offs: Alpine uses musl libc instead of glibc. This can cause subtle behavioral differences in applications compiled against glibc (memory allocation patterns, DNS resolution, locale handling). Test Alpine compatibility explicitly. Alpine is excellent for intermediate build stages. Distroless is preferable for final runtime stages when you want glibc compatibility with zero CVE count.

flowchart LR
    subgraph ubuntu["ubuntu:22.04"]
        direction TB
        U1[Shell + Coreutils]
        U2[Package Manager]
        U3[System Libraries ~200]
        U4[Your App]
    end
    subgraph distroless["distroless/static"]
        direction TB
        D1[TLS Certs]
        D2[Timezone Data]
        D3[Your App]
    end

    ubuntu -- "CVEs: 150-200" --> Vuln[/Attack Surface\]
    distroless -- "CVEs: 0" --> Safe[/Minimal Surface\]

    style ubuntu fill:#ff6b6b,color:#fff
    style distroless fill:#51cf66,color:#fff
    style Vuln fill:#ff6b6b,color:#fff
    style Safe fill:#51cf66,color:#fff
Comparison visual

3. Non-Root User Enforcement: Never Run as UID 0

Running a container process as root (UID 0) is the single most common container misconfiguration. When a container runs as root and achieves a container escape via a kernel vulnerability, the attacker arrives on the host as root. Even within the container, a root process can read any file, write to any path, and load kernel modules if capabilities are not explicitly dropped.

Dockerfile: create and use a non-root user

FROM gcr.io/distroless/base-debian12 AS base

# For images that support useradd (non-distroless build stage):
FROM debian:bookworm-slim AS setup
RUN groupadd --gid 10001 appgroup && \
    useradd \
      --uid 10001 \
      --gid appgroup \
      --no-create-home \
      --shell /sbin/nologin \
      appuser

FROM gcr.io/distroless/base-debian12
# Carry over the passwd/group entries from setup stage
COPY --from=setup /etc/passwd /etc/passwd
COPY --from=setup /etc/group /etc/group

COPY --chown=10001:10001 --from=builder /out/server /server

USER 10001

ENTRYPOINT ["/server"]

Using a numeric UID (USER 10001) rather than a username (USER appuser) is more robust: Kubernetes admission controllers and OPA policies can check numeric UIDs reliably. Username resolution depends on /etc/passwd being present in the image.

Kubernetes securityContext: enforce at the pod level

Even if an image is built to run as non-root, nothing prevents someone from overriding it with --user root in a docker run command or a Kubernetes pod spec. Kubernetes securityContext enforces the constraint at the scheduler level:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  template:
    spec:
      # Pod-level: applies to all containers
      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        runAsGroup: 10001
        fsGroup: 10001
        seccompProfile:
          type: RuntimeDefault   # see section 5
      containers:
        - name: api
          image: registry.example.com/api-server:v1.2.3@sha256:abc123...
          securityContext:
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop:
                - ALL
              add:
                - NET_BIND_SERVICE  # only if port < 1024
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: cache
              mountPath: /var/cache/app
      volumes:
        - name: tmp
          emptyDir: {}
        - name: cache
          emptyDir: {}

runAsNonRoot: true causes the Kubernetes admission controller to reject any pod whose container image is configured to run as root — even if the Dockerfile doesn't specify a USER directive. allowPrivilegeEscalation: false prevents the process from gaining new privileges via setuid binaries or file capabilities.

The secrets-as-root problem: when secrets are mounted and the container runs as root, the mounted secret files are readable by anyone who can exec into the container. With a non-root user and fsGroup set, Kubernetes mounts secret volumes with the correct group ownership so only the application user can read them. This is the difference between "attacker reads your database credentials" and "attacker gets a permission denied error."


4. Image Scanning in CI: Catch CVEs Before They Ship

Scanning at build time is table stakes. Effective scanning also runs on a schedule against deployed images (new CVEs are published daily; an image clean today may have critical vulnerabilities tomorrow) and generates SBOMs for downstream audit.

Trivy in GitHub Actions: fail on critical CVEs

# .github/workflows/container-security.yml
name: Container Security Scan

on:
  push:
    branches: [main]
  pull_request:
  schedule:
    # Daily scan of deployed images
    - cron: '0 6 * * *'

jobs:
  build-and-scan:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      security-events: write   # for GitHub Security tab upload
      packages: write          # for GHCR push

    steps:
      - uses: actions/checkout@v4

      - name: Set up Docker Buildx
        uses: docker/setup-buildx-action@v3

      - name: Build image (do not push yet)
        uses: docker/build-push-action@v5
        with:
          context: .
          push: false
          tags: ${{ github.repository }}:${{ github.sha }}
          cache-from: type=gha
          cache-to: type=gha,mode=max
          outputs: type=docker,dest=/tmp/image.tar
          load: true

      - name: Run Trivy vulnerability scan
        uses: aquasecurity/trivy-action@master
        with:
          input: /tmp/image.tar
          format: 'sarif'
          output: 'trivy-results.sarif'
          severity: 'CRITICAL,HIGH'
          exit-code: '1'           # fail the build
          ignore-unfixed: true     # skip CVEs with no fix available
          vuln-type: 'os,library'

      - name: Upload scan results to GitHub Security tab
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: 'trivy-results.sarif'

      - name: Generate SBOM with Syft
        uses: anchore/sbom-action@v0
        with:
          image: ${{ github.repository }}:${{ github.sha }}
          format: spdx-json
          output-file: sbom.spdx.json

      - name: Attest SBOM (Sigstore)
        uses: actions/attest-sbom@v1
        with:
          subject-name: ghcr.io/${{ github.repository }}
          subject-digest: ${{ steps.build.outputs.digest }}
          sbom-path: sbom.spdx.json

      - name: Push to registry (only if scan passes)
        uses: docker/build-push-action@v5
        with:
          context: .
          push: true
          tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
          cache-from: type=gha

The workflow deliberately builds twice: once to a local tar for scanning, then pushes only if the scan passes. This prevents pushing a vulnerable image to the registry even if the signing step fails.

Grype as an alternative: Anchore Grype is lighter weight and integrates tightly with Syft for SBOM-driven scanning:

# Install
curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin

# Scan an image
grype docker:myapp:latest --fail-on critical

# Scan an SBOM (faster for scheduled rescans — no need to pull image)
grype sbom:sbom.spdx.json --fail-on high

# Output JSON for pipeline integration
grype docker:myapp:latest -o json > scan-results.json

Scheduled base image rescanning: your CI only scans when code changes. New CVEs are published against base images you've already deployed. Add a scheduled job that pulls deployed image digests from your registry and rescans against the current vulnerability database:

#!/bin/bash
# scan-deployed.sh — run daily via cron or CI schedule

REGISTRY="ghcr.io/myorg"
IMAGES=("api-server" "worker" "scheduler")

for IMAGE in "${IMAGES[@]}"; do
  DIGEST=$(crane digest "${REGISTRY}/${IMAGE}:latest")
  echo "Scanning ${IMAGE}@${DIGEST}"
  trivy image \
    --severity CRITICAL \
    --exit-code 1 \
    --ignore-unfixed \
    "${REGISTRY}/${IMAGE}@${DIGEST}" || \
    notify-slack "CRITICAL CVE in deployed image: ${IMAGE}"
done

Image signing with cosign

# Install cosign
brew install cosign  # or: go install github.com/sigstore/cosign/v2/cmd/cosign@latest

# Generate a key pair (or use keyless via OIDC in CI)
cosign generate-key-pair

# Sign an image after push
cosign sign --key cosign.key \
  ghcr.io/myorg/api-server:latest

# Verify before deployment
cosign verify --key cosign.pub \
  ghcr.io/myorg/api-server:latest

# Keyless signing in GitHub Actions (uses Fulcio CA + OIDC)
cosign sign \
  --rekor-url https://rekor.sigstore.dev \
  ghcr.io/myorg/api-server@sha256:abc123...

Keyless signing in CI uses the GitHub OIDC token to prove the image was built by a specific GitHub Actions workflow. The signature is recorded in the Rekor transparency log — publicly auditable, tamper-evident. No key management required.


5. Runtime Security: Contain the Blast Radius

Image security determines what enters the runtime. Runtime security determines what the running process can do. The two layers are independent — a perfectly hardened image can still be exploited at runtime if it runs with excessive capabilities.

Seccomp profiles: restrict syscalls

The Linux kernel exposes ~350 syscalls. A typical web server needs perhaps 40. Every additional syscall is a potential exploitation vector (kernel vulnerabilities are often syscall-triggered). Seccomp (Secure Computing Mode) lets you define an allowlist:

{
  "defaultAction": "SCMP_ACT_ERRNO",
  "architectures": ["SCMP_ARCH_X86_64"],
  "syscalls": [
    {
      "names": [
        "read", "write", "open", "close", "stat", "fstat",
        "mmap", "mprotect", "munmap", "brk", "access",
        "execve", "exit", "wait4", "getpid", "gettid",
        "socket", "connect", "accept", "sendto", "recvfrom",
        "bind", "listen", "getsockname", "setsockopt", "getsockopt",
        "clone", "fork", "futex", "nanosleep", "clock_gettime",
        "epoll_create1", "epoll_ctl", "epoll_wait",
        "signalfd4", "timerfd_create", "eventfd2"
      ],
      "action": "SCMP_ACT_ALLOW"
    }
  ]
}

Apply in Kubernetes:

securityContext:
  seccompProfile:
    type: Localhost
    localhostProfile: profiles/api-server.json  # path under /var/lib/kubelet/seccomp/

RuntimeDefault is the Kubernetes-maintained default seccomp profile. It blocks the most dangerous syscalls (ptrace, kexec_load, open_by_handle_at) without requiring a custom profile. Use it as a minimum baseline; add a custom profile for defense-in-depth.

Linux capabilities: drop ALL, add back only what's needed

Linux capabilities divide root's omnipotence into ~40 distinct privileges. The security principle: drop all capabilities, add back only the specific ones your application requires.

# In Kubernetes securityContext
capabilities:
  drop:
    - ALL
  add:
    - NET_BIND_SERVICE   # bind to port < 1024 (if needed)
    # Common additions:
    # - CHOWN            # change file ownership (avoid if possible)
    # - SETUID/SETGID    # only if app needs to drop privileges post-start

Most web servers and APIs need zero capabilities if they run on port 8080 or higher and the filesystem is owned correctly. NET_BIND_SERVICE is only needed for port 80/443 — use a reverse proxy (nginx, Envoy) to terminate on 80/443 and forward to 8080 internally.

Read-only root filesystem

securityContext:
  readOnlyRootFilesystem: true

# Mount writable volumes only for paths that need them
volumeMounts:
  - name: tmp
    mountPath: /tmp
  - name: app-cache
    mountPath: /var/cache/myapp
  - name: logs
    mountPath: /var/log/myapp

volumes:
  - name: tmp
    emptyDir: {}
  - name: app-cache
    emptyDir:
      sizeLimit: 500Mi
  - name: logs
    emptyDir: {}

With a read-only filesystem, an attacker who achieves code execution cannot write malware, modify application binaries, or create persistent backdoors. The filesystem state is immutable — identical to the image layer on every restart. emptyDir volumes provide writable scratch space without compromising this.

flowchart TD
    A[Container Starts] --> B{seccomp profile\nloaded?}
    B -- Yes --> C[Syscall filter active\n~40 allowed of ~350]
    B -- No --> X1[ALL syscalls allowed\nKernel exploit surface exposed]

    C --> D{Drop ALL\ncapabilities?}
    D -- Yes --> E[No root powers\nNET_BIND_SERVICE only]
    D -- No --> X2[Root capabilities active\nCHOWN, KILL, SYS_ADMIN etc.]

    E --> F{Read-only\nrootfs?}
    F -- Yes --> G[Immutable filesystem\nNo persistence possible]
    F -- No --> X3[Writable rootfs\nMalware can persist]

    G --> H[Hardened Runtime\nBlast radius contained]

    style X1 fill:#ff6b6b,color:#fff
    style X2 fill:#ff6b6b,color:#fff
    style X3 fill:#ff6b6b,color:#fff
    style H fill:#51cf66,color:#fff

6. Secrets Management: Keep Credentials Out of Image Layers

The three most common ways secrets end up in container images — all of them wrong:

Wrong #1: Environment variables in Dockerfile

# NEVER DO THIS
ENV DATABASE_URL="postgresql://user:password@prod-db:5432/app"
ENV API_KEY="sk-live-abc123..."

Environment variables are stored in the image manifest. They appear in docker inspect <container>, docker history <image>, and in Kubernetes pod specs visible to anyone with kubectl get pod -o yaml. Even if the container is stopped, the credentials persist in the image layer indefinitely.

Wrong #2: COPY or ADD credentials into the image

# ALSO NEVER DO THIS — even with a subsequent RUN rm
COPY .env /app/.env
RUN pip install -r requirements.txt
RUN rm /app/.env   # THIS DOES NOT HELP

Docker layers are content-addressed and immutable. RUN rm /app/.env creates a new layer that hides the file but does not delete it from the underlying layer. docker history --no-trunc and layer extraction tools will retrieve the credentials from the earlier layer. This has been exploited against real registries.

Right approach #1: Docker secrets (Swarm / BuildKit)

# syntax=docker/dockerfile:1.7

FROM python:3.12-slim AS builder

# Mount a secret during build — never written to any layer
RUN --mount=type=secret,id=pip_config \
    pip install \
    --index-url "$(cat /run/secrets/pip_config)" \
    --no-cache-dir \
    -r requirements.txt
# Pass secret at build time via BuildKit
docker buildx build \
  --secret id=pip_config,src=./private-pip.conf \
  .

The secret is available only during the RUN step as a tmpfs mount. It never appears in any image layer. docker history shows no trace of it.

Right approach #2: Kubernetes Secrets mounted as files

# Create the secret
kubectl create secret generic db-credentials \
  --from-literal=url='postgresql://user:pass@db:5432/app' \
  --from-literal=password='s3cr3t'

# Mount in pod spec
spec:
  containers:
    - name: api
      volumeMounts:
        - name: db-creds
          mountPath: /run/secrets/db
          readOnly: true
  volumes:
    - name: db-creds
      secret:
        secretName: db-credentials
        defaultMode: 0400   # owner read-only

Kubernetes mounts secrets as tmpfs — memory-only, not written to node disk. With defaultMode: 0400 and runAsUser: 10001 plus fsGroup: 10001, only the application user can read the files.

Right approach #3: HashiCorp Vault agent injection

For secrets rotation and audit logging, Vault agent injection is the production standard:

# Vault agent injects secrets as init container, writes to shared tmpfs
annotations:
  vault.hashicorp.com/agent-inject: "true"
  vault.hashicorp.com/agent-inject-secret-db-creds: "secret/data/myapp/db"
  vault.hashicorp.com/agent-inject-template-db-creds: |
    {{- with secret "secret/data/myapp/db" -}}
    DATABASE_URL=postgresql://{{ .Data.data.username }}:{{ .Data.data.password }}@db:5432/app
    {{- end }}
  vault.hashicorp.com/role: "myapp"

Vault injects an init container that authenticates via Kubernetes service account, fetches the secret, and writes it to a shared in-memory volume at /vault/secrets/. Your application reads it as a file. Vault agent sidecar handles rotation — when the secret expires, the file is rewritten without restarting your pod.

Right approach #4: AWS Secrets Manager via CSI driver

apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: aws-secrets
spec:
  provider: aws
  parameters:
    objects: |
      - objectName: "prod/myapp/db-credentials"
        objectType: "secretsmanager"
        jmesPath:
          - path: "password"
            objectAlias: "db-password"
          - path: "username"
            objectAlias: "db-username"

The CSI driver mounts AWS Secrets Manager values directly as files without ever storing them in a Kubernetes Secret object. This avoids etcd storage entirely.


7. Supply Chain Security: Sign, Verify, and Audit Everything

Supply chain attacks target the gap between "the code you wrote" and "the binary running in production." This gap includes every dependency, every base image, every build tool, and every CI step. Closing that gap requires cryptographic attestation at each stage.

Syft: generate SBOMs

A Software Bill of Materials (SBOM) is a machine-readable inventory of every package and library in your image. It enables downstream CVE scanning, license compliance checks, and incident response (when a new vulnerability is published, you can immediately query which of your images contain the affected package).

# Install syft
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin

# Generate SBOM in SPDX format
syft ghcr.io/myorg/api-server:latest -o spdx-json > sbom.spdx.json

# Generate in CycloneDX format (better tool support)
syft ghcr.io/myorg/api-server:latest -o cyclonedx-json > sbom.cdx.json

# Scan the SBOM for vulnerabilities (faster than scanning the image)
grype sbom:sbom.spdx.json --fail-on critical

# Attest the SBOM to the image (stored in registry alongside image)
cosign attest \
  --predicate sbom.spdx.json \
  --type spdxjson \
  ghcr.io/myorg/api-server@sha256:abc123...

Cosign and Sigstore: cryptographic image signing

# Full signing workflow in CI (keyless, using OIDC)
# 1. Build and push the image
docker buildx build --push \
  -t ghcr.io/myorg/api-server:v1.2.3 .

# 2. Get the digest of what was pushed
DIGEST=$(crane digest ghcr.io/myorg/api-server:v1.2.3)

# 3. Sign (records to Rekor transparency log)
cosign sign \
  ghcr.io/myorg/api-server@${DIGEST}

# 4. At deploy time, verify before running
cosign verify \
  --certificate-identity-regexp "^https://github.com/myorg/myrepo/.github/workflows/.*" \
  --certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
  ghcr.io/myorg/api-server@${DIGEST}

The Rekor transparency log (rekor.sigstore.dev) is a public, append-only, cryptographically verifiable ledger. Every signature is recorded with the signing identity, timestamp, and image digest. You can audit exactly which CI run signed which image.

OPA/Gatekeeper: enforce trusted base image policy

# OPA ConstraintTemplate: require signed images from approved registries
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
  name: requiresignedimages
spec:
  crd:
    spec:
      names:
        kind: RequireSignedImages
  targets:
    - target: admission.k8s.gatekeeper.sh
      rego: |
        package requiresignedimages

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not startswith(container.image, "ghcr.io/myorg/")
          msg := sprintf("Image %v is not from the approved registry", [container.image])
        }

        violation[{"msg": msg}] {
          container := input.review.object.spec.containers[_]
          not regex.match(".*@sha256:[a-f0-9]{64}$", container.image)
          msg := sprintf("Image %v must be pinned to a digest, not a tag", [container.image])
        }

The digest-pinning rule is critical: image tags are mutable. myapp:latest can be silently overwritten by an attacker who gains registry access. Referencing by digest (@sha256:abc...) is immutable — the content is cryptographically bound to the identifier.


8. Production Container Hardening Checklist

A reference checklist for production deployments. Every item should be verifiable in CI or via admission controller policy.

Category Check Tool/Method
Build Multi-stage build: runtime image has no compiler/build tools docker history
Build No secrets in ENV or COPY — use BuildKit --mount=type=secret docker history --no-trunc
Build Base image pinned to digest, not tag Dockerfile inspection
Build .dockerignore excludes .git, .env, credentials, test data .dockerignore review
Image Distroless or Alpine base (not ubuntu/debian full) Image scan
Image Final image < 50MB (ideally < 15MB for Go/Rust) docker images
Image Trivy/Grype scan passes with no unpatched CRITICAL CVEs CI gate
Image SBOM generated and attested cosign attest
Image Image signed with cosign cosign verify
Image Tagged with git digest, not latest Registry policy
Runtime USER directive sets non-root UID in Dockerfile Dockerfile inspection
Runtime runAsNonRoot: true in Kubernetes securityContext OPA policy
Runtime readOnlyRootFilesystem: true OPA policy
Runtime allowPrivilegeEscalation: false OPA policy
Runtime capabilities: drop: [ALL] OPA policy
Runtime seccompProfile: RuntimeDefault or custom profile OPA policy
Runtime Resource limits set (CPU + memory) OPA policy
Runtime No hostPID, hostNetwork, hostIPC OPA policy
Secrets No secrets in environment variables kubectl get pod -o yaml audit
Secrets Secrets mounted as files via Kubernetes Secret or Vault Pod spec review
Secrets Secret volumes mounted with readOnly: true Pod spec review
Secrets Vault/CSI driver used for rotation-capable secrets Vault audit log
Network NetworkPolicy restricts ingress/egress to required paths kubectl get networkpolicy
Scanning Base images rescanned daily (not just at build time) Scheduled CI job
Audit Falco or similar runtime threat detection enabled Falco rules active

Conclusion

Container security is most effective when it is automated and enforced by policy — not when it depends on individual developers remembering to do the right thing. The patterns in this post compose into a layered defense: multi-stage builds eliminate build-time bloat, distroless images reduce the CVE surface to near zero, non-root enforcement removes the most common privilege escalation path, image scanning catches known vulnerabilities before they reach production, runtime security profiles contain the blast radius if something does get exploited, secrets management ensures credentials never appear in image layers or environment variables, and supply chain tooling provides cryptographic proof of what you're actually running.

None of these layers is sufficient alone. A perfectly hardened image is worthless if the running container has allowPrivilegeEscalation: true. A well-enforced runtime policy is undermined if secrets are stored in environment variables visible to docker inspect. The checklist in section 8 is a dependency graph as much as a checklist — each item strengthens the others.

Start with the high-impact items: multi-stage builds and distroless base images reduce your attack surface by the largest margin for the least engineering effort. Add non-root enforcement and readOnlyRootFilesystem next — both are single-line changes in a Dockerfile and Kubernetes spec. Then layer in CI scanning, secrets management, and supply chain attestation as your team's capacity allows. The goal is a container that is immutable, minimal, scannable, signed, and running as close to zero privilege as its workload requires.


Sources


Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.

Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter

Comments

Popular posts from this blog

29 Million Secrets Leaked: The Hardcoded Credentials Crisis

What is an LLM? A Beginner's Guide to Large Language Models

What Is Voice AI? TTS, STT, and Voice Agents Explained