Platform Engineering in 2026: Internal Developer Platforms, Backstage, and the Golden Path

There's a pattern that repeats across every organization that scales past 50 engineers: infrastructure becomes a full-time job for product developers. Kubernetes YAML sprawls across repositories. Every team builds their own deployment pipeline. New engineers spend their first month asking "where do I find the runbook for X?" instead of writing features.
Platform engineering is the discipline that solves this. It treats the developer experience as a product — building internal tools, golden paths, and self-service infrastructure so that application developers can deploy, monitor, and operate their services without becoming Kubernetes experts. In 2026, platform engineering has moved from Google/Netflix-scale concern to the standard approach for any organization running more than 10 microservices.
The Problem: Infrastructure as a Blocker
The anti-pattern looks like this: a DevOps team manages Kubernetes, Terraform, CI/CD pipelines, and observability. Every new service requires a ticket to that team. The DevOps team becomes a bottleneck — they're constantly firefighting and can't keep up with service requests. Developers wait days for environment provisioning. Senior engineers spend 20% of their time on infrastructure they shouldn't need to touch.
The cognitive load compounds. A developer who wants to ship a feature must understand: Docker build, Kubernetes manifests, Helm charts, ArgoCD sync, Prometheus alerts, PagerDuty routing, VPC networking, IAM policies. Each of these is a separate expertise domain.
graph LR
subgraph "Without Platform Engineering"
D1[Dev Team A] -->|"ticket: new service"| O[Ops/DevOps Team]
D2[Dev Team B] -->|"ticket: env provision"| O
D3[Dev Team C] -->|"ticket: alert setup"| O
O --> I[Infrastructure]
O -.->|bottleneck| O
end
Platform engineering inverts this: the platform team builds self-service capabilities, and application teams use them.
graph LR
subgraph "With Platform Engineering"
D1[Dev Team A] --> P[Internal Developer Platform]
D2[Dev Team B] --> P
D3[Dev Team C] --> P
P --> I[Infrastructure\nKubernetes, Cloud, CI/CD]
PT[Platform Team] -->|builds + operates| P
end
The platform team builds once; application teams move fast.
How It Works: The Three Layers
A mature IDP has three layers:
1. Infrastructure Layer: The actual compute, networking, and storage. Kubernetes clusters, cloud accounts, databases. The platform team owns this.
2. Platform Services Layer: Standardized abstractions over infrastructure. Deployment pipelines, secrets management, observability stack, service mesh. Application teams don't configure these directly — they use them through the platform.
3. Developer Interface Layer: The self-service portal, CLI, and documentation that application teams interact with. Backstage is the most common implementation of this layer.
graph TD
A[Developer Interface\nBackstage, CLI, Docs] --> B[Platform Services\nCI/CD, Secrets, Observability, Service Mesh]
B --> C[Infrastructure\nKubernetes, Cloud, Databases]
D[Application Developer] -->|self-service| A
E[Platform Team] -->|builds + operates| B
E -->|manages| C
style A fill:#3b82f6,color:#fff
style B fill:#8b5cf6,color:#fff
style C fill:#6b7280,color:#fff
Implementation: Building with Backstage
Backstage, open-sourced by Spotify and now a CNCF project, is the most widely adopted IDP frontend. It provides a software catalog, scaffolding templates, and plugin framework.
Setting Up Backstage
# Scaffold a new Backstage app npx @backstage/create-app@latest --skip-install cd my-backstage-app yarn install # Start dev mode yarn dev # → http://localhost:3000
The core concept is the Software Catalog — a centralized registry of all services, APIs, libraries, and infrastructure components. Each component is described by a catalog-info.yaml:
# catalog-info.yaml (checked into each service repo)
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
name: payments-service
description: Processes payment transactions via Stripe
annotations:
github.com/project-slug: myorg/payments-service
backstage.io/techdocs-ref: dir:.
pagerduty.com/service-id: P123ABC
prometheus.io/rule: sum(rate(http_requests_total{service="payments"}[5m]))
tags:
- payments
- critical
- python
links:
- url: https://grafana.internal/d/payments
title: Grafana Dashboard
- url: https://runbooks.internal/payments
title: Runbook
spec:
type: service
lifecycle: production
owner: team-payments
system: checkout-platform
dependsOn:
- component:postgres-payments
- component:redis-sessions
providesApis:
- payments-api
When every service has this file, Backstage aggregates them into a searchable catalog. Engineers can find any service, see its owner, dependencies, runbooks, and live health status — all in one place.
Service Templates: The Golden Path
The golden path is the opinionated, pre-approved way to create new services. Instead of copy-pasting Kubernetes YAML and Dockerfile from existing services (with inevitable drift), teams use Backstage templates to scaffold new services with all standards pre-baked:
# Template definition (stored in Backstage)
apiVersion: scaffolder.backstage.io/v1beta3
kind: Template
metadata:
name: python-microservice
title: Python Microservice
description: Creates a production-ready Python service with FastAPI, Docker, CI/CD, and Kubernetes manifests
spec:
owner: platform-team
type: service
parameters:
- title: Service Information
properties:
name:
type: string
title: Service Name
pattern: "^[a-z][a-z0-9-]{2,30}$"
description:
type: string
title: Service Description
owner:
type: string
title: Owning Team
ui:field: OwnerPicker
- title: Infrastructure
properties:
namespace:
type: string
title: Kubernetes Namespace
enum: [production, staging, development]
replicas:
type: integer
title: Initial Replica Count
default: 2
minimum: 1
maximum: 10
memory_limit:
type: string
title: Memory Limit
default: "512Mi"
steps:
- id: fetch-template
name: Fetch Base Template
action: fetch:template
input:
url: ./skeleton
values:
name: ${{ parameters.name }}
owner: ${{ parameters.owner }}
namespace: ${{ parameters.namespace }}
replicas: ${{ parameters.replicas }}
- id: create-github-repo
name: Create GitHub Repository
action: github:repo:create
input:
repoUrl: github.com?repo=${{ parameters.name }}&owner=myorg
description: ${{ parameters.description }}
- id: push-to-github
name: Push Template to GitHub
action: github:repo:push
input:
repoUrl: github.com?repo=${{ parameters.name }}&owner=myorg
- id: register-in-catalog
name: Register in Catalog
action: catalog:register
input:
repoContentsUrl: ${{ steps['create-github-repo'].output.repoContentsUrl }}
catalogInfoPath: /catalog-info.yaml
- id: create-github-environments
name: Setup Environments
action: github:environment:create
input:
repoUrl: github.com?repo=${{ parameters.name }}&owner=myorg
environments: [development, staging, production]
output:
links:
- title: Repository
url: ${{ steps['create-github-repo'].output.remoteUrl }}
- title: Open in Catalog
url: ${{ steps['register-in-catalog'].output.entityRef }}
The template skeleton (in ./skeleton/) contains the actual files — Dockerfile, FastAPI app structure, GitHub Actions workflow, Kubernetes Helm values, Prometheus alert rules — all templated with the values from the form above.
A developer fills out a form in the Backstage UI, clicks "Create," and in 30 seconds has a GitHub repo with:
- Production-ready Dockerfile with multi-stage build
- FastAPI app with health endpoints
- GitHub Actions CI/CD pipeline deploying to Kubernetes
- Helm chart with resource limits and HPA configured
- Prometheus alerts for error rate and latency
catalog-info.yamlregistering the service in Backstage
This is the golden path. Not "here's the documentation," but "here's the working thing, already configured correctly."
TechDocs: Documentation as Code
Backstage's TechDocs plugin renders Markdown documentation from service repositories directly in the catalog. Documentation lives next to the code, versioned in Git, and is discoverable through Backstage search:
# mkdocs.yml in each service repo site_name: Payments Service nav: - Home: index.md - Architecture: architecture.md - API Reference: api.md - Runbook: runbook.md - On-Call Guide: oncall.md plugins: - techdocs-core
<!-- docs/runbook.md --> # Payments Service Runbook ## High Error Rate Alert **Symptom:** `PaymentsHighErrorRate` alert firing **Threshold:** Error rate > 5% for 5 minutes ### Immediate Steps 1. Check recent deployments: `kubectl rollout history deploy/payments-service -n production` 2. Check error logs: `kubectl logs -l app=payments-service -n production --tail=100` 3. Check Stripe API status: https://status.stripe.com ...
Engineers find runbooks from the Backstage catalog, not by asking "where's the runbook for X?" in Slack.
The Platform Team's Operating Model
A platform team of 3-5 engineers can support 50-150 application developers when built correctly. The key is operating like a product team, not a shared services team:
Product management: The platform has a roadmap, a backlog prioritized by developer impact, and regular user research with application teams. "What slows you down?" is the core question.
SLOs for the platform: The platform itself has service level objectives. Deployment pipeline P99 runtime < 10 minutes. Backstage availability > 99.5%. Provisioning request time < 2 minutes. Developers treating the platform as a product means they can plan around it.
Self-service by default: If a developer must file a ticket for a common task, that's a product gap. Ticket-worthy tasks should become self-service templates within 2 sprints of being identified.
graph TD
A[Identify developer friction] --> B{Ticket volume > 5/week?}
B -- Yes --> C[Build self-service template or automation]
B -- No --> D[Document workaround]
C --> E[Measure adoption]
E --> F{Adoption > 80%?}
F -- Yes --> G[Retire old process]
F -- No --> H[Improve UX or documentation]
H --> E
Crossplane: Infrastructure as Kubernetes Resources
Backstage handles the developer interface. Crossplane handles the infrastructure provisioning. Together they form a complete self-service layer.
Crossplane extends Kubernetes with custom resource definitions (CRDs) that represent cloud resources. An application team creates a Kubernetes YAML file to request a database — Crossplane provisions the actual RDS instance in AWS.
# Developer submits this YAML to create a production PostgreSQL database
# No AWS console, no Terraform, no ticket to the platform team
apiVersion: database.example.com/v1alpha1
kind: PostgreSQLInstance
metadata:
name: payments-db
namespace: payments-prod
spec:
parameters:
storageGB: 100
engineVersion: "16"
instanceClass: db.r6g.xlarge
multiAZ: true
backupRetentionDays: 30
compositionRef:
name: postgresql-aws-production
writeConnectionSecretToRef:
name: payments-db-credentials # Automatically written to K8s Secret
The platform team defines Compositions — the Crossplane resources that translate this high-level request into AWS RDS, security groups, parameter groups, and subnet groups. Application teams only see the high-level API. They can't accidentally provision an unencrypted database or skip backups — the platform composition enforces the defaults.
# Platform team's Composition (defined once, used by all teams)
apiVersion: apiextensions.crossplane.io/v1
kind: Composition
metadata:
name: postgresql-aws-production
spec:
compositeTypeRef:
apiVersion: database.example.com/v1alpha1
kind: PostgreSQLInstance
resources:
- name: rds-instance
base:
apiVersion: rds.aws.upbound.io/v1beta1
kind: Instance
spec:
forProvider:
region: us-east-1
encrypted: true # Always enforced
iamDatabaseAuthenticationEnabled: true
deletionProtection: true # Platform enforces this
patches:
- fromFieldPath: spec.parameters.storageGB
toFieldPath: spec.forProvider.allocatedStorage
- fromFieldPath: spec.parameters.instanceClass
toFieldPath: spec.forProvider.instanceClass
This pattern — platform defines the opinionated "what's allowed," teams configure within that envelope — is the essence of platform engineering applied to infrastructure.
Platform Engineering Anti-Patterns
Understanding what platform engineering looks like when done wrong saves months of rework:
Anti-pattern 1: Platform as gatekeeping. The platform team creates a "self-service" portal that still requires a human to approve requests. This is just a ticket system with a UI. Self-service means automated provisioning, not form submission.
Anti-pattern 2: Building everything from scratch. Teams sometimes build custom CI/CD engines, secret managers, and service meshes instead of configuring existing solutions. The result: underdocumented custom tooling that breaks when the original author leaves. Use open-source standards; add value with opinionated configuration.
Anti-pattern 3: No feedback loop. Platform teams that don't regularly talk to developers build tools nobody uses. Run "paper cuts" sessions monthly: what slows you down this sprint? Prioritize accordingly.
Anti-pattern 4: Mandating the golden path without exceptions. Every large organization has legacy services that can't immediately adopt the new platform. Forcing migration causes conflict and backlash. Offer a path that makes new services easy, without blocking teams on legacy systems.
Anti-pattern 5: Platform team as a cost center. Platform engineering has clear ROI — measure it. Time saved per developer per week × number of developers × engineer cost = platform value. Deployment frequency and DORA metrics tell the story quantitatively. A platform team that can't show ROI will be cut in the next budget cycle.
flowchart TD
A[Platform team approach]
A --> B[Self-service automation\n✅ Anti-pattern 1 fix]
A --> C[Configure open-source tooling\n✅ Anti-pattern 2 fix]
A --> D[Regular developer feedback\n✅ Anti-pattern 3 fix]
A --> E[Opt-in golden path\n✅ Anti-pattern 4 fix]
A --> F[Measure DORA + ROI\n✅ Anti-pattern 5 fix]
style B fill:#22c55e,color:#fff
style C fill:#22c55e,color:#fff
style D fill:#22c55e,color:#fff
style E fill:#22c55e,color:#fff
style F fill:#22c55e,color:#fff
Production Considerations
What Not to Build
Platform teams that try to build everything burn out and produce tools nobody uses. The most common mistake is building custom CI/CD from scratch. Use GitHub Actions, GitLab CI, or Tekton — your value-add is the opinionated workflows on top, not the engine itself.
Similarly, don't build custom secret managers, custom monitoring agents, or custom service mesh implementations. Vault, the OpenTelemetry Collector, and Istio/Cilium are mature. Your job is to configure them correctly and wrap them in self-service abstractions.
Measuring Platform Success
Metrics that matter:
- DORA metrics: Deployment frequency, lead time for changes, change failure rate, mean time to recovery. The platform should improve all four.
- Onboarding time: How long until a new engineer ships their first feature? Platform teams track this.
- Self-service ratio: What percentage of infrastructure requests are fulfilled through self-service vs. tickets?
- Platform adoption: Are teams using the golden paths? Deviations are technical debt.
Multi-Tenancy and Guardrails
The platform enforces standards without blocking innovation. Use Open Policy Agent (OPA) admission controllers to enforce security policies — no privileged containers, no latest image tags, required resource limits — at deploy time rather than in code review.
# OPA/Gatekeeper constraint: require resource limits on all containers
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: K8sRequiredResources
metadata:
name: require-resource-limits
spec:
match:
kinds:
- apiGroups: ["apps"]
kinds: ["Deployment"]
parameters:
required: ["limits.memory", "limits.cpu", "requests.memory", "requests.cpu"]
Teams can still customize — but they can't accidentally ship a container without resource limits.
The Paved Road vs the Off-Road
Platform engineering doesn't mean mandating one way to do everything. The "paved road" metaphor is more accurate than "golden path": a paved road is smooth, fast, and well-maintained. You can drive off it, but you're aware you're doing so — and you take on more responsibility.
For a Python microservice deploying to Kubernetes, the paved road means:
- FastAPI (not Flask, not Django — one framework, well-supported by the platform)
- Dockerfile from the standard base image (pre-baked security scanning, non-root user)
- GitHub Actions pipeline from the template (not custom pipelines in Jenkins)
- Helm chart from the platform's chart library (not custom Kubernetes YAML)
- Prometheus client pre-integrated (not optional — metrics are mandatory)
Teams that need to go off-road (legacy services, specialized requirements) can — but they own the maintenance. The platform team doesn't guarantee support for custom configurations.
This creates a natural incentive: new services take the paved road because it's genuinely faster. The effort to maintain a custom configuration isn't worth it compared to using the templated, already-working setup.
Operationally, paved-road services benefit from platform improvements automatically. When the platform team upgrades the base Docker image for a security vulnerability, all paved-road services get the fix in their next build — without the service team doing anything. Off-road services have to handle it manually.
The ratio matters: if 80% of services are on the paved road, platform improvements have leverage. If only 20% are, the platform team's work has limited impact.
Developer Portals: Search, Discover, Understand
The unsexy part of platform engineering is documentation and discoverability. Developers spend significant time finding: who owns this service? Where's the runbook? What APIs does it expose? How do I get access to it?
Backstage's search indexes the entire software catalog — services, APIs, documentation, and owners. But the value multiplies when every service has quality catalog-info.yaml and TechDocs. This requires a culture shift: documentation is part of the definition of done.
A practical forcing function: the platform team's deployment pipeline validates that catalog-info.yaml exists and contains required fields before a service can deploy to production.
# GitHub Actions check — runs on every PR
name: Platform Compliance Check
on: [pull_request]
jobs:
catalog-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Validate catalog-info.yaml exists
run: |
if [ ! -f "catalog-info.yaml" ]; then
echo "❌ catalog-info.yaml is required for all services"
exit 1
fi
- name: Validate required fields
run: |
python3 -c "
import yaml, sys
with open('catalog-info.yaml') as f:
catalog = yaml.safe_load(f)
required = ['metadata.name', 'metadata.description', 'spec.owner', 'spec.lifecycle']
missing = []
for field in required:
keys = field.split('.')
obj = catalog
for key in keys:
if key not in obj:
missing.append(field)
break
obj = obj[key]
if missing:
print(f'❌ Missing required fields: {missing}')
sys.exit(1)
print('✅ catalog-info.yaml valid')
"
- name: Check TechDocs directory exists
run: |
if [ ! -d "docs" ] || [ ! -f "docs/index.md" ]; then
echo "⚠️ docs/index.md is recommended for all services (see platform wiki)"
fi
This kind of automated compliance — not blocking deployments for missing docs, but flagging it visibly — moves cultural change faster than documentation mandates alone.
Conclusion
Platform engineering has proven its ROI: organizations that invest in it report 40-60% reduction in time-to-production for new services and significant improvements in DORA metrics. The key insights:
- Build products, not shared services: Treat your IDP as a product with a roadmap, metrics, and user research
- Golden paths are opinionated: Offer one well-maintained path rather than infinite flexibility that becomes everyone's problem
- Self-service or bust: Every ticket-based workflow is a candidate for automation
- Measure what matters: DORA metrics, onboarding time, and self-service ratio tell you if the platform is working
- Backstage is the catalog, not the whole platform — the real work is the pipelines, templates, and integrations behind it
The alternative — every team maintaining their own infrastructure — doesn't scale. Platform engineering is the way engineering organizations maintain velocity as they grow.
The most important mindset shift for a platform team: you're not a help desk, you're a product team. Your customers are internal developers. Your product metrics are DORA improvements and developer satisfaction scores. Run user research, maintain a public roadmap, and deprecate unused tools the same way a product team deprecates features. Platform engineering done right is invisible — developers don't notice the infrastructure because it just works.
Getting started doesn't require Backstage on day one. Start with a standardized Dockerfile, a shared GitHub Actions workflow library, and a wiki page listing every service and its owner. That's already more than most teams have. Add Backstage when the catalog needs to be searchable, not before. The principles matter more than the tooling: self-service, golden paths, and measuring developer experience as a first-class metric.
Sources
- Spotify Engineering: "What is Backstage?"
- CNCF Platforms Working Group White Paper
- DORA State of DevOps Report 2025
- Puppet State of DevOps Report 2024
- "Team Topologies" by Skelton and Pais
Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.
☕ Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter
Comments
Post a Comment