Container Security in 2026: Multi-Stage Builds, Distroless Images, and Supply Chain Security
Container Security in 2026: Multi-Stage Builds, Distroless Images, and Supply Chain Security

Introduction
Container security is not a checkbox. It is a layered discipline that spans your build pipeline, base image choices, runtime configuration, secrets handling, and software supply chain. Most teams get some of this right some of the time — but the gaps between layers are where breaches happen.
The threat landscape in 2026 looks different than it did in 2020. Supply chain attacks are now the dominant vector for container compromises. The SolarWinds pattern — compromise a build tool or base image rather than the target directly — has been replicated across dozens of incidents. Dependency confusion attacks, malicious packages injected into public registries, and tampered base images are all confirmed, documented attack paths. Meanwhile, misconfigured runtime permissions remain the most common root cause of container escapes in post-incident reports.
This post covers the full container security stack for production engineering teams: multi-stage builds that eliminate build-time bloat and attack surface, distroless and minimal base images that have near-zero CVE counts, non-root user enforcement at both the Docker and Kubernetes layers, CI-integrated image scanning with SBOM generation, runtime security profiles that limit syscall exposure, secrets management patterns that keep credentials out of image layers and environment variables, and supply chain security tooling (cosign, syft, Sigstore) that lets you cryptographically verify what you're running. Each section includes concrete, runnable code you can adapt directly.
The goal is a hardened container that is small, scannable, signed, secrets-free, and running with the minimum privilege it needs to do its job.
1. Multi-Stage Builds: Ship the Artifact, Not the Toolchain
The most impactful single change most teams can make to container security is adopting multi-stage builds. The principle is simple: you need a fully equipped build environment to compile and package your software, but you do not need that environment at runtime. Every tool you ship — compilers, build systems, package managers, debugging utilities — is attack surface that can be exploited after a container is compromised.
Multi-stage builds let you define a builder stage with everything needed to compile, then copy only the final artifact into a minimal runtime image.
Go application: 250MB → 8MB
# syntax=docker/dockerfile:1.7
# Stage 1: Builder
FROM golang:1.22-alpine AS builder
WORKDIR /src
# Copy dependency manifests first (cache layer)
COPY go.mod go.sum ./
RUN go mod download
# Copy source and build a statically linked binary
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 \
go build \
-ldflags="-w -s -extldflags=-static" \
-trimpath \
-o /out/server \
./cmd/server
# Stage 2: Distroless runtime (no shell, no package manager)
FROM gcr.io/distroless/static-debian12:nonroot
# Copy only the compiled binary
COPY --from=builder /out/server /server
EXPOSE 8080
ENTRYPOINT ["/server"]
The -ldflags="-w -s" flags strip debug info and symbol tables (smaller binary, nothing for attackers to symbolize). -trimpath removes local build paths from stack traces (privacy). CGO_ENABLED=0 ensures no C runtime dependency — the binary runs on any Linux kernel without libc.
Result: the builder image is ~350MB with the Go toolchain. The final runtime image is ~8MB (distroless/static base is ~2MB, binary adds the rest). The runtime image contains no shell, no package manager, no compiler — only the binary and the minimal system libraries it needs.
Python application with dependency isolation
# syntax=docker/dockerfile:1.7
# Stage 1: Dependency builder
FROM python:3.12-slim AS builder
WORKDIR /app
# Install build tools only in builder
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
libffi-dev \
&& rm -rf /var/lib/apt/lists/*
# Install Python dependencies into a prefix we can copy
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
# Stage 2: Runtime
FROM python:3.12-slim AS runtime
# Create non-root user
RUN useradd --system --no-create-home --shell /sbin/nologin appuser
WORKDIR /app
# Copy installed packages from builder
COPY --from=builder /install /usr/local
# Copy application source (no build tools present)
COPY --chown=appuser:appuser src/ ./src/
USER appuser
EXPOSE 8000
CMD ["python", "-m", "uvicorn", "src.main:app", "--host", "0.0.0.0", "--port", "8000"]
The key discipline here: build-essential and libffi-dev appear only in the builder stage. They are needed to compile native extensions like cryptography or uvloop. The runtime stage gets a fresh python:3.12-slim and only receives the pre-built packages via COPY --from=builder. No compiler, no build headers — an attacker who achieves code execution cannot install new compiled tools.
Layer caching discipline: always copy go.mod/requirements.txt before source code. Docker caches layers by content hash. If you copy source first, a single source line change invalidates the entire dependency installation cache. Structuring for cache locality can cut CI build times by 60-80% on large projects.
flowchart TD
A[Source Code] --> B[Builder Stage\nFull toolchain + deps]
B --> C[Compile / Package\ngo build / pip install]
C --> D{Copy artifact only}
D --> E[Runtime Stage\nMinimal base image]
E --> F[Final Image\n~8MB, no compiler]
style B fill:#ff6b6b,color:#fff
style E fill:#51cf66,color:#fff
style F fill:#339af0,color:#fff

2. Distroless and Minimal Base Images: Eliminate the Attack Surface
A container image is a filesystem. Every binary, library, and configuration file in that filesystem is a potential exploit path. The traditional approach — start with ubuntu:22.04 because it is familiar — ships a complete operating system with hundreds of packages, most of which your application never touches. Each of those packages can carry CVEs.
Google's distroless images strip this down to the absolute minimum: only the language runtime and direct system dependencies your application needs. No shell (/bin/sh, /bin/bash), no package manager (apt, apk), no coreutils (ls, chmod, curl). The attack surface collapses.
Image comparison by CVE count (2026 data)
| Base Image | Size | Typical CVE Count | Has Shell | Has Package Manager |
|---|---|---|---|---|
ubuntu:22.04 |
~70MB | 150-200 CVEs | Yes | Yes (apt) |
debian:bookworm-slim |
~75MB | 100-150 CVEs | Yes | Yes (apt) |
alpine:3.19 |
~7MB | 5-20 CVEs | Yes (ash) | Yes (apk) |
gcr.io/distroless/base-debian12 |
~20MB | 0-5 CVEs | No | No |
gcr.io/distroless/static-debian12 |
~2MB | 0 CVEs | No | No |
scratch |
0MB | 0 CVEs | No | No |
When to use each:
scratch: statically compiled binaries with zero external dependencies (Go CGO_ENABLED=0, Rust with musl target). Nothing else will work — no DNS resolver, no TLS certs. Must bundle/etc/ssl/certs/ca-certificates.crtand/etc/passwdif your app needs them.distroless/static: statically compiled binaries that need TLS certs and basic system files. Google includes these. Best for Go, Rust.distroless/base: dynamically linked binaries that need glibc. Includes OpenSSL, glibc, libssl. Best for applications with C extensions.distroless/python3,distroless/nodejs: pre-built distroless variants for interpreted runtimes. Google maintains these.alpine: when you need a shell for debugging or entrypoint scripts. Vastly smaller than debian variants. Acceptable for development, avoid for production where possible.
The no-shell constraint eliminates RCE pivot
If an attacker exploits a vulnerability in your application and achieves command execution, their next move is always to run additional commands: download a reverse shell, enumerate the filesystem, escalate privileges. No shell means no shell commands. They are limited to what your application binary can do. Combined with a read-only root filesystem (covered in section 5), the attacker's ability to establish persistence collapses.
# Scratch example: Go binary with bundled TLS certs
FROM golang:1.22-alpine AS builder
RUN apk add --no-cache ca-certificates
WORKDIR /src
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 go build -ldflags="-w -s" -o /out/server ./cmd/server
FROM scratch
# Copy TLS certs from builder
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
# Copy passwd for non-root UID (see section 3)
COPY --from=builder /etc/passwd /etc/passwd
COPY --from=builder /out/server /server
USER nobody
ENTRYPOINT ["/server"]
Alpine trade-offs: Alpine uses musl libc instead of glibc. This can cause subtle behavioral differences in applications compiled against glibc (memory allocation patterns, DNS resolution, locale handling). Test Alpine compatibility explicitly. Alpine is excellent for intermediate build stages. Distroless is preferable for final runtime stages when you want glibc compatibility with zero CVE count.
flowchart LR
subgraph ubuntu["ubuntu:22.04"]
direction TB
U1[Shell + Coreutils]
U2[Package Manager]
U3[System Libraries ~200]
U4[Your App]
end
subgraph distroless["distroless/static"]
direction TB
D1[TLS Certs]
D2[Timezone Data]
D3[Your App]
end
ubuntu -- "CVEs: 150-200" --> Vuln[/Attack Surface\]
distroless -- "CVEs: 0" --> Safe[/Minimal Surface\]
style ubuntu fill:#ff6b6b,color:#fff
style distroless fill:#51cf66,color:#fff
style Vuln fill:#ff6b6b,color:#fff
style Safe fill:#51cf66,color:#fff

3. Non-Root User Enforcement: Never Run as UID 0
Running a container process as root (UID 0) is the single most common container misconfiguration. When a container runs as root and achieves a container escape via a kernel vulnerability, the attacker arrives on the host as root. Even within the container, a root process can read any file, write to any path, and load kernel modules if capabilities are not explicitly dropped.
Dockerfile: create and use a non-root user
FROM gcr.io/distroless/base-debian12 AS base
# For images that support useradd (non-distroless build stage):
FROM debian:bookworm-slim AS setup
RUN groupadd --gid 10001 appgroup && \
useradd \
--uid 10001 \
--gid appgroup \
--no-create-home \
--shell /sbin/nologin \
appuser
FROM gcr.io/distroless/base-debian12
# Carry over the passwd/group entries from setup stage
COPY --from=setup /etc/passwd /etc/passwd
COPY --from=setup /etc/group /etc/group
COPY --chown=10001:10001 --from=builder /out/server /server
USER 10001
ENTRYPOINT ["/server"]
Using a numeric UID (USER 10001) rather than a username (USER appuser) is more robust: Kubernetes admission controllers and OPA policies can check numeric UIDs reliably. Username resolution depends on /etc/passwd being present in the image.
Kubernetes securityContext: enforce at the pod level
Even if an image is built to run as non-root, nothing prevents someone from overriding it with --user root in a docker run command or a Kubernetes pod spec. Kubernetes securityContext enforces the constraint at the scheduler level:
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-server
spec:
template:
spec:
# Pod-level: applies to all containers
securityContext:
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
seccompProfile:
type: RuntimeDefault # see section 5
containers:
- name: api
image: registry.example.com/api-server:v1.2.3@sha256:abc123...
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE # only if port < 1024
volumeMounts:
- name: tmp
mountPath: /tmp
- name: cache
mountPath: /var/cache/app
volumes:
- name: tmp
emptyDir: {}
- name: cache
emptyDir: {}
runAsNonRoot: true causes the Kubernetes admission controller to reject any pod whose container image is configured to run as root — even if the Dockerfile doesn't specify a USER directive. allowPrivilegeEscalation: false prevents the process from gaining new privileges via setuid binaries or file capabilities.
The secrets-as-root problem: when secrets are mounted and the container runs as root, the mounted secret files are readable by anyone who can exec into the container. With a non-root user and fsGroup set, Kubernetes mounts secret volumes with the correct group ownership so only the application user can read them. This is the difference between "attacker reads your database credentials" and "attacker gets a permission denied error."
4. Image Scanning in CI: Catch CVEs Before They Ship
Scanning at build time is table stakes. Effective scanning also runs on a schedule against deployed images (new CVEs are published daily; an image clean today may have critical vulnerabilities tomorrow) and generates SBOMs for downstream audit.
Trivy in GitHub Actions: fail on critical CVEs
# .github/workflows/container-security.yml
name: Container Security Scan
on:
push:
branches: [main]
pull_request:
schedule:
# Daily scan of deployed images
- cron: '0 6 * * *'
jobs:
build-and-scan:
runs-on: ubuntu-latest
permissions:
contents: read
security-events: write # for GitHub Security tab upload
packages: write # for GHCR push
steps:
- uses: actions/checkout@v4
- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Build image (do not push yet)
uses: docker/build-push-action@v5
with:
context: .
push: false
tags: ${{ github.repository }}:${{ github.sha }}
cache-from: type=gha
cache-to: type=gha,mode=max
outputs: type=docker,dest=/tmp/image.tar
load: true
- name: Run Trivy vulnerability scan
uses: aquasecurity/trivy-action@master
with:
input: /tmp/image.tar
format: 'sarif'
output: 'trivy-results.sarif'
severity: 'CRITICAL,HIGH'
exit-code: '1' # fail the build
ignore-unfixed: true # skip CVEs with no fix available
vuln-type: 'os,library'
- name: Upload scan results to GitHub Security tab
uses: github/codeql-action/upload-sarif@v3
if: always()
with:
sarif_file: 'trivy-results.sarif'
- name: Generate SBOM with Syft
uses: anchore/sbom-action@v0
with:
image: ${{ github.repository }}:${{ github.sha }}
format: spdx-json
output-file: sbom.spdx.json
- name: Attest SBOM (Sigstore)
uses: actions/attest-sbom@v1
with:
subject-name: ghcr.io/${{ github.repository }}
subject-digest: ${{ steps.build.outputs.digest }}
sbom-path: sbom.spdx.json
- name: Push to registry (only if scan passes)
uses: docker/build-push-action@v5
with:
context: .
push: true
tags: ghcr.io/${{ github.repository }}:${{ github.sha }}
cache-from: type=gha
The workflow deliberately builds twice: once to a local tar for scanning, then pushes only if the scan passes. This prevents pushing a vulnerable image to the registry even if the signing step fails.
Grype as an alternative: Anchore Grype is lighter weight and integrates tightly with Syft for SBOM-driven scanning:
# Install
curl -sSfL https://raw.githubusercontent.com/anchore/grype/main/install.sh | sh -s -- -b /usr/local/bin
# Scan an image
grype docker:myapp:latest --fail-on critical
# Scan an SBOM (faster for scheduled rescans — no need to pull image)
grype sbom:sbom.spdx.json --fail-on high
# Output JSON for pipeline integration
grype docker:myapp:latest -o json > scan-results.json
Scheduled base image rescanning: your CI only scans when code changes. New CVEs are published against base images you've already deployed. Add a scheduled job that pulls deployed image digests from your registry and rescans against the current vulnerability database:
#!/bin/bash
# scan-deployed.sh — run daily via cron or CI schedule
REGISTRY="ghcr.io/myorg"
IMAGES=("api-server" "worker" "scheduler")
for IMAGE in "${IMAGES[@]}"; do
DIGEST=$(crane digest "${REGISTRY}/${IMAGE}:latest")
echo "Scanning ${IMAGE}@${DIGEST}"
trivy image \
--severity CRITICAL \
--exit-code 1 \
--ignore-unfixed \
"${REGISTRY}/${IMAGE}@${DIGEST}" || \
notify-slack "CRITICAL CVE in deployed image: ${IMAGE}"
done
Image signing with cosign
# Install cosign
brew install cosign # or: go install github.com/sigstore/cosign/v2/cmd/cosign@latest
# Generate a key pair (or use keyless via OIDC in CI)
cosign generate-key-pair
# Sign an image after push
cosign sign --key cosign.key \
ghcr.io/myorg/api-server:latest
# Verify before deployment
cosign verify --key cosign.pub \
ghcr.io/myorg/api-server:latest
# Keyless signing in GitHub Actions (uses Fulcio CA + OIDC)
cosign sign \
--rekor-url https://rekor.sigstore.dev \
ghcr.io/myorg/api-server@sha256:abc123...
Keyless signing in CI uses the GitHub OIDC token to prove the image was built by a specific GitHub Actions workflow. The signature is recorded in the Rekor transparency log — publicly auditable, tamper-evident. No key management required.
5. Runtime Security: Contain the Blast Radius
Image security determines what enters the runtime. Runtime security determines what the running process can do. The two layers are independent — a perfectly hardened image can still be exploited at runtime if it runs with excessive capabilities.
Seccomp profiles: restrict syscalls
The Linux kernel exposes ~350 syscalls. A typical web server needs perhaps 40. Every additional syscall is a potential exploitation vector (kernel vulnerabilities are often syscall-triggered). Seccomp (Secure Computing Mode) lets you define an allowlist:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64"],
"syscalls": [
{
"names": [
"read", "write", "open", "close", "stat", "fstat",
"mmap", "mprotect", "munmap", "brk", "access",
"execve", "exit", "wait4", "getpid", "gettid",
"socket", "connect", "accept", "sendto", "recvfrom",
"bind", "listen", "getsockname", "setsockopt", "getsockopt",
"clone", "fork", "futex", "nanosleep", "clock_gettime",
"epoll_create1", "epoll_ctl", "epoll_wait",
"signalfd4", "timerfd_create", "eventfd2"
],
"action": "SCMP_ACT_ALLOW"
}
]
}
Apply in Kubernetes:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: profiles/api-server.json # path under /var/lib/kubelet/seccomp/
RuntimeDefault is the Kubernetes-maintained default seccomp profile. It blocks the most dangerous syscalls (ptrace, kexec_load, open_by_handle_at) without requiring a custom profile. Use it as a minimum baseline; add a custom profile for defense-in-depth.
Linux capabilities: drop ALL, add back only what's needed
Linux capabilities divide root's omnipotence into ~40 distinct privileges. The security principle: drop all capabilities, add back only the specific ones your application requires.
# In Kubernetes securityContext
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE # bind to port < 1024 (if needed)
# Common additions:
# - CHOWN # change file ownership (avoid if possible)
# - SETUID/SETGID # only if app needs to drop privileges post-start
Most web servers and APIs need zero capabilities if they run on port 8080 or higher and the filesystem is owned correctly. NET_BIND_SERVICE is only needed for port 80/443 — use a reverse proxy (nginx, Envoy) to terminate on 80/443 and forward to 8080 internally.
Read-only root filesystem
securityContext:
readOnlyRootFilesystem: true
# Mount writable volumes only for paths that need them
volumeMounts:
- name: tmp
mountPath: /tmp
- name: app-cache
mountPath: /var/cache/myapp
- name: logs
mountPath: /var/log/myapp
volumes:
- name: tmp
emptyDir: {}
- name: app-cache
emptyDir:
sizeLimit: 500Mi
- name: logs
emptyDir: {}
With a read-only filesystem, an attacker who achieves code execution cannot write malware, modify application binaries, or create persistent backdoors. The filesystem state is immutable — identical to the image layer on every restart. emptyDir volumes provide writable scratch space without compromising this.
flowchart TD
A[Container Starts] --> B{seccomp profile\nloaded?}
B -- Yes --> C[Syscall filter active\n~40 allowed of ~350]
B -- No --> X1[ALL syscalls allowed\nKernel exploit surface exposed]
C --> D{Drop ALL\ncapabilities?}
D -- Yes --> E[No root powers\nNET_BIND_SERVICE only]
D -- No --> X2[Root capabilities active\nCHOWN, KILL, SYS_ADMIN etc.]
E --> F{Read-only\nrootfs?}
F -- Yes --> G[Immutable filesystem\nNo persistence possible]
F -- No --> X3[Writable rootfs\nMalware can persist]
G --> H[Hardened Runtime\nBlast radius contained]
style X1 fill:#ff6b6b,color:#fff
style X2 fill:#ff6b6b,color:#fff
style X3 fill:#ff6b6b,color:#fff
style H fill:#51cf66,color:#fff
6. Secrets Management: Keep Credentials Out of Image Layers
The three most common ways secrets end up in container images — all of them wrong:
Wrong #1: Environment variables in Dockerfile
# NEVER DO THIS
ENV DATABASE_URL="postgresql://user:password@prod-db:5432/app"
ENV API_KEY="sk-live-abc123..."
Environment variables are stored in the image manifest. They appear in docker inspect <container>, docker history <image>, and in Kubernetes pod specs visible to anyone with kubectl get pod -o yaml. Even if the container is stopped, the credentials persist in the image layer indefinitely.
Wrong #2: COPY or ADD credentials into the image
# ALSO NEVER DO THIS — even with a subsequent RUN rm
COPY .env /app/.env
RUN pip install -r requirements.txt
RUN rm /app/.env # THIS DOES NOT HELP
Docker layers are content-addressed and immutable. RUN rm /app/.env creates a new layer that hides the file but does not delete it from the underlying layer. docker history --no-trunc and layer extraction tools will retrieve the credentials from the earlier layer. This has been exploited against real registries.
Right approach #1: Docker secrets (Swarm / BuildKit)
# syntax=docker/dockerfile:1.7
FROM python:3.12-slim AS builder
# Mount a secret during build — never written to any layer
RUN --mount=type=secret,id=pip_config \
pip install \
--index-url "$(cat /run/secrets/pip_config)" \
--no-cache-dir \
-r requirements.txt
# Pass secret at build time via BuildKit
docker buildx build \
--secret id=pip_config,src=./private-pip.conf \
.
The secret is available only during the RUN step as a tmpfs mount. It never appears in any image layer. docker history shows no trace of it.
Right approach #2: Kubernetes Secrets mounted as files
# Create the secret
kubectl create secret generic db-credentials \
--from-literal=url='postgresql://user:pass@db:5432/app' \
--from-literal=password='s3cr3t'
# Mount in pod spec
spec:
containers:
- name: api
volumeMounts:
- name: db-creds
mountPath: /run/secrets/db
readOnly: true
volumes:
- name: db-creds
secret:
secretName: db-credentials
defaultMode: 0400 # owner read-only
Kubernetes mounts secrets as tmpfs — memory-only, not written to node disk. With defaultMode: 0400 and runAsUser: 10001 plus fsGroup: 10001, only the application user can read the files.
Right approach #3: HashiCorp Vault agent injection
For secrets rotation and audit logging, Vault agent injection is the production standard:
# Vault agent injects secrets as init container, writes to shared tmpfs
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/agent-inject-secret-db-creds: "secret/data/myapp/db"
vault.hashicorp.com/agent-inject-template-db-creds: |
{{- with secret "secret/data/myapp/db" -}}
DATABASE_URL=postgresql://{{ .Data.data.username }}:{{ .Data.data.password }}@db:5432/app
{{- end }}
vault.hashicorp.com/role: "myapp"
Vault injects an init container that authenticates via Kubernetes service account, fetches the secret, and writes it to a shared in-memory volume at /vault/secrets/. Your application reads it as a file. Vault agent sidecar handles rotation — when the secret expires, the file is rewritten without restarting your pod.
Right approach #4: AWS Secrets Manager via CSI driver
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: aws-secrets
spec:
provider: aws
parameters:
objects: |
- objectName: "prod/myapp/db-credentials"
objectType: "secretsmanager"
jmesPath:
- path: "password"
objectAlias: "db-password"
- path: "username"
objectAlias: "db-username"
The CSI driver mounts AWS Secrets Manager values directly as files without ever storing them in a Kubernetes Secret object. This avoids etcd storage entirely.
7. Supply Chain Security: Sign, Verify, and Audit Everything
Supply chain attacks target the gap between "the code you wrote" and "the binary running in production." This gap includes every dependency, every base image, every build tool, and every CI step. Closing that gap requires cryptographic attestation at each stage.
Syft: generate SBOMs
A Software Bill of Materials (SBOM) is a machine-readable inventory of every package and library in your image. It enables downstream CVE scanning, license compliance checks, and incident response (when a new vulnerability is published, you can immediately query which of your images contain the affected package).
# Install syft
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s -- -b /usr/local/bin
# Generate SBOM in SPDX format
syft ghcr.io/myorg/api-server:latest -o spdx-json > sbom.spdx.json
# Generate in CycloneDX format (better tool support)
syft ghcr.io/myorg/api-server:latest -o cyclonedx-json > sbom.cdx.json
# Scan the SBOM for vulnerabilities (faster than scanning the image)
grype sbom:sbom.spdx.json --fail-on critical
# Attest the SBOM to the image (stored in registry alongside image)
cosign attest \
--predicate sbom.spdx.json \
--type spdxjson \
ghcr.io/myorg/api-server@sha256:abc123...
Cosign and Sigstore: cryptographic image signing
# Full signing workflow in CI (keyless, using OIDC)
# 1. Build and push the image
docker buildx build --push \
-t ghcr.io/myorg/api-server:v1.2.3 .
# 2. Get the digest of what was pushed
DIGEST=$(crane digest ghcr.io/myorg/api-server:v1.2.3)
# 3. Sign (records to Rekor transparency log)
cosign sign \
ghcr.io/myorg/api-server@${DIGEST}
# 4. At deploy time, verify before running
cosign verify \
--certificate-identity-regexp "^https://github.com/myorg/myrepo/.github/workflows/.*" \
--certificate-oidc-issuer "https://token.actions.githubusercontent.com" \
ghcr.io/myorg/api-server@${DIGEST}
The Rekor transparency log (rekor.sigstore.dev) is a public, append-only, cryptographically verifiable ledger. Every signature is recorded with the signing identity, timestamp, and image digest. You can audit exactly which CI run signed which image.
OPA/Gatekeeper: enforce trusted base image policy
# OPA ConstraintTemplate: require signed images from approved registries
apiVersion: templates.gatekeeper.sh/v1
kind: ConstraintTemplate
metadata:
name: requiresignedimages
spec:
crd:
spec:
names:
kind: RequireSignedImages
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package requiresignedimages
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not startswith(container.image, "ghcr.io/myorg/")
msg := sprintf("Image %v is not from the approved registry", [container.image])
}
violation[{"msg": msg}] {
container := input.review.object.spec.containers[_]
not regex.match(".*@sha256:[a-f0-9]{64}$", container.image)
msg := sprintf("Image %v must be pinned to a digest, not a tag", [container.image])
}
The digest-pinning rule is critical: image tags are mutable. myapp:latest can be silently overwritten by an attacker who gains registry access. Referencing by digest (@sha256:abc...) is immutable — the content is cryptographically bound to the identifier.
8. Production Container Hardening Checklist
A reference checklist for production deployments. Every item should be verifiable in CI or via admission controller policy.
| Category | Check | Tool/Method |
|---|---|---|
| Build | Multi-stage build: runtime image has no compiler/build tools | docker history |
| Build | No secrets in ENV or COPY — use BuildKit --mount=type=secret |
docker history --no-trunc |
| Build | Base image pinned to digest, not tag | Dockerfile inspection |
| Build | .dockerignore excludes .git, .env, credentials, test data |
.dockerignore review |
| Image | Distroless or Alpine base (not ubuntu/debian full) | Image scan |
| Image | Final image < 50MB (ideally < 15MB for Go/Rust) | docker images |
| Image | Trivy/Grype scan passes with no unpatched CRITICAL CVEs | CI gate |
| Image | SBOM generated and attested | cosign attest |
| Image | Image signed with cosign | cosign verify |
| Image | Tagged with git digest, not latest |
Registry policy |
| Runtime | USER directive sets non-root UID in Dockerfile |
Dockerfile inspection |
| Runtime | runAsNonRoot: true in Kubernetes securityContext |
OPA policy |
| Runtime | readOnlyRootFilesystem: true |
OPA policy |
| Runtime | allowPrivilegeEscalation: false |
OPA policy |
| Runtime | capabilities: drop: [ALL] |
OPA policy |
| Runtime | seccompProfile: RuntimeDefault or custom profile |
OPA policy |
| Runtime | Resource limits set (CPU + memory) | OPA policy |
| Runtime | No hostPID, hostNetwork, hostIPC | OPA policy |
| Secrets | No secrets in environment variables | kubectl get pod -o yaml audit |
| Secrets | Secrets mounted as files via Kubernetes Secret or Vault | Pod spec review |
| Secrets | Secret volumes mounted with readOnly: true |
Pod spec review |
| Secrets | Vault/CSI driver used for rotation-capable secrets | Vault audit log |
| Network | NetworkPolicy restricts ingress/egress to required paths | kubectl get networkpolicy |
| Scanning | Base images rescanned daily (not just at build time) | Scheduled CI job |
| Audit | Falco or similar runtime threat detection enabled | Falco rules active |
Conclusion
Container security is most effective when it is automated and enforced by policy — not when it depends on individual developers remembering to do the right thing. The patterns in this post compose into a layered defense: multi-stage builds eliminate build-time bloat, distroless images reduce the CVE surface to near zero, non-root enforcement removes the most common privilege escalation path, image scanning catches known vulnerabilities before they reach production, runtime security profiles contain the blast radius if something does get exploited, secrets management ensures credentials never appear in image layers or environment variables, and supply chain tooling provides cryptographic proof of what you're actually running.
None of these layers is sufficient alone. A perfectly hardened image is worthless if the running container has allowPrivilegeEscalation: true. A well-enforced runtime policy is undermined if secrets are stored in environment variables visible to docker inspect. The checklist in section 8 is a dependency graph as much as a checklist — each item strengthens the others.
Start with the high-impact items: multi-stage builds and distroless base images reduce your attack surface by the largest margin for the least engineering effort. Add non-root enforcement and readOnlyRootFilesystem next — both are single-line changes in a Dockerfile and Kubernetes spec. Then layer in CI scanning, secrets management, and supply chain attestation as your team's capacity allows. The goal is a container that is immutable, minimal, scannable, signed, and running as close to zero privilege as its workload requires.
Sources
- Google Distroless Images — official distroless base image repository
- Trivy Documentation — container vulnerability scanner by Aqua Security
- Grype by Anchore — vulnerability scanner for container images and filesystems
- Syft SBOM Generator — SBOM generator supporting SPDX and CycloneDX formats
- Cosign / Sigstore — keyless container image signing and verification
- Kubernetes Security Context — official Kubernetes documentation
- OPA Gatekeeper — policy enforcement for Kubernetes
- Docker BuildKit Secrets — official BuildKit secret mount documentation
- HashiCorp Vault Agent Injection — Vault agent sidecar for Kubernetes
- NIST SP 800-190: Application Container Security Guide — NIST container security guidance
Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.
☕ Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter
Comments
Post a Comment