Terraform Advanced Patterns: Modules, Remote State, and Managing Infrastructure Drift
Terraform Advanced Patterns: Modules, Remote State, and Managing Infrastructure Drift

Most teams start Terraform with a single main.tf file and a handful of resources. Six months later, that file is 2,000 lines. Three environments (dev, staging, prod) have separate copies that have gradually diverged. The state file lives in someone's laptop. Applying requires knowing the "correct order" of operations. This is infrastructure as code in name only.
Terraform's advanced patterns — modular architecture, remote state, workspaces, drift detection, and policy-as-code — transform it from a "run once and pray" tool to a reliable, auditable infrastructure platform. This guide covers all of them with production-grade examples.
The Problem: Flat Terraform That Doesn't Scale
The anti-patterns that compound over time:
project/
├── main.tf # 2,000+ lines of everything
├── variables.tf # 50 variables, minimal documentation
├── outputs.tf # What outputs exist? Nobody knows.
└── terraform.tfstate # State on someone's laptop — NOT in git
When another team member runs terraform apply, they don't have the state file. When you add a new environment, you duplicate the entire directory. When you want to share the VPC configuration with another project, you copy-paste it.
The structured alternative:
infrastructure/
├── modules/
│ ├── vpc/ # Reusable VPC module
│ ├── eks/ # Reusable EKS cluster module
│ ├── rds/ # Reusable RDS module
│ └── service/ # Reusable service-level module
├── environments/
│ ├── dev/ # Dev environment composition
│ ├── staging/ # Staging environment composition
│ └── prod/ # Prod environment composition
└── global/ # Cross-environment resources (DNS, IAM)
How It Works: Module Architecture
A Terraform module is any directory with .tf files. Modules encapsulate resources, expose a clean interface (variables/outputs), and hide implementation details. The calling configuration provides inputs; the module returns outputs.
Writing a Reusable Module
# modules/rds/variables.tf
variable "identifier" {
description = "Unique identifier for this RDS instance"
type = string
validation {
condition = can(regex("^[a-z][a-z0-9-]{1,60}$", var.identifier))
error_message = "Identifier must be 2-62 lowercase alphanumeric characters or hyphens."
}
}
variable "engine_version" {
description = "PostgreSQL engine version"
type = string
default = "16.2"
}
variable "instance_class" {
description = "RDS instance type"
type = string
default = "db.t3.medium"
}
variable "storage_gb" {
description = "Allocated storage in gigabytes"
type = number
default = 20
validation {
condition = var.storage_gb >= 20 && var.storage_gb <= 65536
error_message = "Storage must be between 20 and 65536 GB."
}
}
variable "subnet_ids" {
description = "List of subnet IDs for the DB subnet group"
type = list(string)
}
variable "vpc_id" {
description = "VPC ID for security group placement"
type = string
}
variable "allowed_security_group_ids" {
description = "Security group IDs allowed to connect to this RDS instance"
type = list(string)
default = []
}
variable "tags" {
description = "Additional tags to apply to resources"
type = map(string)
default = {}
}
# modules/rds/main.tf
locals {
common_tags = merge(var.tags, {
Module = "rds"
Managed = "terraform"
})
}
resource "aws_db_subnet_group" "this" {
name = "${var.identifier}-subnet-group"
subnet_ids = var.subnet_ids
tags = local.common_tags
}
resource "aws_security_group" "rds" {
name = "${var.identifier}-rds-sg"
vpc_id = var.vpc_id
description = "Security group for ${var.identifier} RDS instance"
tags = local.common_tags
ingress {
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = var.allowed_security_group_ids
description = "PostgreSQL from allowed security groups"
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
}
resource "random_password" "master" {
length = 32
special = true
override_special = "!#$%^&*()-_=+[]{}|"
}
resource "aws_secretsmanager_secret" "rds_password" {
name = "/${var.identifier}/rds/master-password"
recovery_window_in_days = 7
tags = local.common_tags
}
resource "aws_secretsmanager_secret_version" "rds_password" {
secret_id = aws_secretsmanager_secret.rds_password.id
secret_string = random_password.master.result
}
resource "aws_db_instance" "this" {
identifier = var.identifier
engine = "postgres"
engine_version = var.engine_version
instance_class = var.instance_class
allocated_storage = var.storage_gb
max_allocated_storage = var.storage_gb * 2 # Auto-scaling up to 2×
storage_encrypted = true # Always encrypt
db_subnet_group_name = aws_db_subnet_group.this.name
vpc_security_group_ids = [aws_security_group.rds.id]
username = "postgres"
password = random_password.master.result
multi_az = var.instance_class != "db.t3.micro" # No multi-AZ for dev
deletion_protection = true
skip_final_snapshot = false
final_snapshot_identifier = "${var.identifier}-final"
backup_retention_period = 14
backup_window = "03:00-04:00"
maintenance_window = "Mon:04:00-Mon:05:00"
performance_insights_enabled = true
monitoring_interval = 60
tags = local.common_tags
}
# modules/rds/outputs.tf
output "endpoint" {
description = "RDS instance endpoint"
value = aws_db_instance.this.endpoint
}
output "port" {
description = "RDS instance port"
value = aws_db_instance.this.port
}
output "security_group_id" {
description = "Security group ID attached to this RDS instance"
value = aws_security_group.rds.id
}
output "secret_arn" {
description = "ARN of the Secrets Manager secret containing the master password"
value = aws_secretsmanager_secret.rds_password.arn
sensitive = true
}
Calling the Module from an Environment
# environments/prod/main.tf
terraform {
required_version = ">= 1.7"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0" # Pin to major version — minor updates auto-apply
}
}
backend "s3" {
bucket = "myorg-terraform-state"
key = "prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock" # Prevents concurrent applies
}
}
# Reference the VPC from a separate state
data "terraform_remote_state" "vpc" {
backend = "s3"
config = {
bucket = "myorg-terraform-state"
key = "prod/vpc/terraform.tfstate"
region = "us-east-1"
}
}
module "payments_db" {
source = "../../modules/rds"
identifier = "payments-prod"
instance_class = "db.r6g.xlarge"
storage_gb = 100
subnet_ids = data.terraform_remote_state.vpc.outputs.private_subnet_ids
vpc_id = data.terraform_remote_state.vpc.outputs.vpc_id
allowed_security_group_ids = [
module.payments_service.security_group_id
]
tags = {
Environment = "prod"
Team = "payments"
CostCenter = "engineering"
}
}
Terragrunt: DRY Terraform Configurations
Repeating the same backend configuration and provider setup across dozens of module instantiations violates DRY. Terragrunt wraps Terraform to eliminate this repetition:
# terragrunt.hcl (at repository root)
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite"
}
config = {
bucket = "myorg-terraform-state"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
generate "provider" {
path = "provider.tf"
if_exists = "overwrite"
contents = <<EOF
provider "aws" {
region = "us-east-1"
default_tags {
tags = {
ManagedBy = "terraform"
Repository = "myorg/infrastructure"
}
}
}
EOF
}
# environments/prod/payments-db/terragrunt.hcl
include "root" {
path = find_in_parent_folders() # Inherits root terragrunt.hcl
}
terraform {
source = "../../../modules//rds" # Double // = module root
}
# Pass inputs to the module
inputs = {
identifier = "payments-prod"
instance_class = "db.r6g.xlarge"
storage_gb = 100
subnet_ids = dependency.vpc.outputs.private_subnet_ids
vpc_id = dependency.vpc.outputs.vpc_id
}
# Declare dependency on VPC module output
dependency "vpc" {
config_path = "../vpc"
mock_outputs = {
private_subnet_ids = ["subnet-mock"]
vpc_id = "vpc-mock"
}
mock_outputs_allowed_terraform_commands = ["validate", "plan"]
}
With Terragrunt, terragrunt run-all plan plans all modules in the directory tree simultaneously, respecting dependencies. terragrunt run-all apply applies them in the correct order.
The directory structure mirrors the account/region/environment hierarchy naturally:
infrastructure/
├── terragrunt.hcl ← root config, shared by all
├── prod/
│ ├── us-east-1/
│ │ ├── vpc/
│ │ │ └── terragrunt.hcl
│ │ ├── eks/
│ │ │ └── terragrunt.hcl
│ │ └── payments-db/
│ │ └── terragrunt.hcl ← uses root config, declares vpc dep
Remote State and State Locking
State stored locally is a collaboration and reliability problem. Remote state solves both:
# Bootstrap: create the S3 bucket and DynamoDB lock table first
# (typically in a separate "bootstrap" Terraform config or done manually)
resource "aws_s3_bucket" "terraform_state" {
bucket = "myorg-terraform-state"
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration {
status = "Enabled" # Keep all state file versions — essential for recovery
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default {
sse_algorithm = "aws:kms"
}
}
}
resource "aws_dynamodb_table" "terraform_lock" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
}
With remote state, terraform apply acquires a DynamoDB lock first. If another apply is running, it fails immediately instead of corrupting the state file.
Testing Terraform with Terratest
Infrastructure code should be tested like application code. Terratest deploys real infrastructure in a test AWS account, runs assertions, and tears it down:
// test/rds_module_test.go
package test
import (
"testing"
"github.com/gruntwork-io/terratest/modules/terraform"
"github.com/gruntwork-io/terratest/modules/aws"
"github.com/stretchr/testify/assert"
)
func TestRDSModule(t *testing.T) {
t.Parallel()
terraformOptions := &terraform.Options{
TerraformDir: "../modules/rds",
Vars: map[string]interface{}{
"identifier": "test-db-" + uniqueID(),
"instance_class": "db.t3.micro", // Cheapest for tests
"storage_gb": 20,
"subnet_ids": getTestSubnetIDs(t),
"vpc_id": getTestVPCID(t),
},
}
// Clean up after test regardless of pass/fail
defer terraform.Destroy(t, terraformOptions)
// Apply the module
terraform.InitAndApply(t, terraformOptions)
// Assert outputs
endpoint := terraform.Output(t, terraformOptions, "endpoint")
assert.NotEmpty(t, endpoint)
// Assert the actual AWS resource
dbID := "test-db-" + uniqueID()
db := aws.GetRdsInstanceById(t, dbID, "us-east-1")
assert.True(t, *db.StorageEncrypted, "Storage should be encrypted")
assert.True(t, *db.DeletionProtection, "Deletion protection should be enabled")
assert.Equal(t, int64(14), *db.BackupRetentionPeriod, "Backup retention should be 14 days")
}
Terratest tests are integration tests — they deploy real infrastructure and take 5-15 minutes. Run them on PR to main only, not every commit. Keep a dedicated test AWS account with limited quotas and automated cleanup of orphaned resources.
Drift Detection and Remediation
Infrastructure drift: the actual cloud state diverges from Terraform state. This happens when someone makes a manual change in the AWS console, or a resource is modified by another process.
# Detect drift — shows what changed outside Terraform
terraform plan -refresh-only
# Example output showing drift:
# ~ aws_security_group.rds
# ingress {
# + cidr_blocks = ["10.0.1.0/24"] # Someone added this manually
# }
Remediate drift in CI with a scheduled plan:
# .github/workflows/drift-detection.yml
name: Terraform Drift Detection
on:
schedule:
- cron: '0 */6 * * *' # Every 6 hours
jobs:
drift-check:
runs-on: ubuntu-latest
strategy:
matrix:
environment: [dev, staging, prod]
steps:
- uses: actions/checkout@v4
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: "~1.7"
- name: Terraform Init
run: terraform init
working-directory: environments/${{ matrix.environment }}
- name: Check for Drift
id: plan
run: terraform plan -refresh-only -detailed-exitcode -out=drift.plan 2>&1
working-directory: environments/${{ matrix.environment }}
continue-on-error: true # Exit code 2 = drift detected
- name: Alert on Drift
if: steps.plan.outputs.exitcode == '2'
uses: slackapi/slack-github-action@v1
with:
payload: |
{
"text": "⚠️ Infrastructure drift detected in ${{ matrix.environment }}. Review: ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
}
env:
SLACK_WEBHOOK_URL: ${{ secrets.SLACK_WEBHOOK_URL }}
The for_each and count Patterns
Creating multiple similar resources without copy-paste:
# for_each: create one resource per map entry (use over count when possible)
variable "environments" {
default = {
dev = { instance_class = "db.t3.micro", storage_gb = 20 }
staging = { instance_class = "db.t3.medium", storage_gb = 50 }
prod = { instance_class = "db.r6g.xlarge", storage_gb = 100 }
}
}
resource "aws_db_instance" "this" {
for_each = var.environments
identifier = "myapp-${each.key}"
instance_class = each.value.instance_class
allocated_storage = each.value.storage_gb
# ... other config
}
# Reference specific instances
output "prod_endpoint" {
value = aws_db_instance.this["prod"].endpoint
}
# Why for_each over count: with count, removing middle item renumbers all later items
# With for_each, removing "staging" only destroys the staging instance
# count is fine for identical resources; for_each for distinct resources
# Dynamic blocks: create nested config blocks programmatically
resource "aws_security_group" "this" {
name = "app-sg"
vpc_id = var.vpc_id
dynamic "ingress" {
for_each = var.allowed_ports
content {
from_port = ingress.value
to_port = ingress.value
protocol = "tcp"
cidr_blocks = var.allowed_cidrs
}
}
}
Terraform Policy as Code with Sentinel / OPA
Before terraform apply touches production, validate that the plan meets organizational policies — no public S3 buckets, no unencrypted databases, required tags on all resources:
# Sentinel policy (HCP Terraform)
# Prevents any S3 bucket from being publicly accessible
import "tfplan/v2" as tfplan
main = rule {
all tfplan.resource_changes as _, changes {
changes.type is "aws_s3_bucket" and
changes.change.after.acl in ["private", null]
}
}
For open-source (no HCP Terraform), use conftest with OPA Rego:
# policies/required_tags.rego
package terraform
required_tags := {"Environment", "Team", "CostCenter"}
deny[msg] {
resource := input.resource_changes[_]
resource.change.actions[_] in ["create", "update"]
resource_tags := {tag | resource.change.after.tags[tag]}
missing := required_tags - resource_tags
count(missing) > 0
msg := sprintf("Resource %s is missing required tags: %v", [resource.address, missing])
}
# CI check
- name: Generate Terraform Plan JSON
run: terraform show -json tfplan.binary > plan.json
- name: Policy Check
run: conftest test plan.json --policy ./policies/
# Fails the pipeline if any deny rules trigger
CI/CD Pipeline for Terraform
Infrastructure changes need the same review process as application code — but with extra care because mistakes can be irreversible. The standard CI/CD pipeline for Terraform:
# .github/workflows/terraform.yml
name: Terraform Plan / Apply
on:
pull_request:
paths: ['environments/**', 'modules/**']
push:
branches: [main]
paths: ['environments/**', 'modules/**']
jobs:
terraform-plan:
runs-on: ubuntu-latest
strategy:
matrix:
environment: [dev, staging, prod]
permissions:
id-token: write # For OIDC authentication to AWS (no static credentials)
pull-requests: write
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/terraform-ci-${{ matrix.environment }}
aws-region: us-east-1
- uses: hashicorp/setup-terraform@v3
with:
terraform_version: "~1.7"
- name: Terraform Init
run: terraform init
working-directory: environments/${{ matrix.environment }}
- name: Terraform Validate
run: terraform validate
working-directory: environments/${{ matrix.environment }}
- name: Terraform Plan
id: plan
run: terraform plan -out=tfplan -no-color 2>&1 | tee plan-output.txt
working-directory: environments/${{ matrix.environment }}
- name: Comment Plan on PR
uses: actions/github-script@v7
if: github.event_name == 'pull_request'
with:
script: |
const planOutput = require('fs').readFileSync('environments/${{ matrix.environment }}/plan-output.txt', 'utf8')
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `## Terraform Plan: ${{ matrix.environment }}\n\`\`\`\n${planOutput.slice(0, 65000)}\n\`\`\``
})
terraform-apply:
needs: terraform-plan
if: github.ref == 'refs/heads/main' && github.event_name == 'push'
environment: prod # GitHub Environment with required reviewers
runs-on: ubuntu-latest
steps:
# ... same init steps
- name: Terraform Apply
run: terraform apply -auto-approve tfplan
working-directory: environments/prod
Key practices:
- OIDC instead of static credentials: AWS IAM roles assumed via OIDC federation — no AWS keys stored in GitHub secrets
- Plan as PR comment: reviewers see exactly what will change before approving
- GitHub Environments with required reviewers: human approval before production apply
- Separate roles per environment: CI role for dev has fewer permissions than prod apply role
Production Considerations
Module Versioning
Once modules are shared across teams, pin versions to prevent unexpected changes:
# Pin to a specific tagged version in a private registry or Git tag
module "rds" {
source = "git::https://github.com/myorg/terraform-modules.git//rds?ref=v2.1.0"
# ... inputs
}
# Or from Terraform Registry
module "vpc" {
source = "terraform-aws-modules/vpc/aws"
version = "~> 5.0" # Accept 5.x but not 6.x
}
Module updates are PRs. Teams subscribe to module changelog. Breaking changes increment the major version.
Import Existing Resources
Migrating existing manually-provisioned infrastructure to Terraform requires importing the existing state without recreating resources. The import block (Terraform 1.5+) makes this declarative:
# Import an existing RDS instance into Terraform management
import {
to = module.payments_db.aws_db_instance.this
id = "payments-prod" # The RDS identifier
}
# Terraform generates the config to match the existing resource
# terraform plan -generate-config-out=generated.tf
# Review generated.tf, clean it up, then add to your config
Before import blocks, the workflow was terraform import command + manually writing the matching config (error-prone). The declarative approach is safer: plan shows what would change before applying.
Organizing Large Configurations with moved Blocks
Renaming or moving resources without destroying and recreating them:
# When you rename a resource (e.g., refactoring module structure),
# use moved blocks to update state without destroying infrastructure
moved {
from = aws_db_instance.rds
to = module.payments_db.aws_db_instance.this
}
Without moved, Terraform destroys the old resource and creates a new one — catastrophic for databases. With moved, it updates the state reference only.
Conclusion
Terraform's advanced patterns solve the problems that flat configurations create as teams and infrastructure grow:
- Module architecture enables code reuse and consistent standards across teams
- Remote state with locking makes collaboration safe and auditable
- Drift detection in CI catches manual changes before they become incidents
- Policy as code prevents compliance violations from reaching production
- Module versioning gives consuming teams stability with an upgrade path
The investment in structure pays for itself the first time a new team member can provision a production database by calling a module with five lines of HCL — without understanding every networking and security detail behind it.
The pattern that prevents the most incidents: terraform plan in CI before every merge to the main branch, terraform apply only after the plan output is reviewed and approved. Infrastructure changes that skip review — "I'll just apply this small tweak manually" — are how configuration drift and outages start. The CI pipeline enforces the discipline when humans are in a hurry.
Terraform's adoption trajectory in 2026 includes a split between OpenTofu (the Linux Foundation fork, created after HashiCorp changed the license) and Terraform (now BSL licensed under HashiCorp). For the community, OpenTofu is a drop-in compatible fork that runs all existing Terraform configurations. The choice between them comes down to licensing requirements and vendor support contracts, not technical capability — they're essentially equivalent for the patterns in this guide.
The universal advice: store state remotely from day one. Version-pin providers. Build modules before you have four copies of the same resource block. Drift detection in CI before drift becomes an incident. These habits are cheap to establish early and expensive to retrofit.
One pattern that accelerates module adoption: provide working examples alongside each module. A examples/basic/ directory with a minimal working instantiation of the module reduces the time to first successful terraform apply from hours to minutes. Teams adopting a new module don't want to read variable documentation — they want to copy a working example and modify it. Modules with examples get adopted; modules without examples get copy-pasted around instead.
Sources
- HashiCorp Terraform documentation: modules, backends, remote state
- Gruntwork "Terraform: Up & Running" (Brikman)
- Terragrunt documentation: gruntwork-io.github.io/terragrunt
- Open Policy Agent: conftest for policy-as-code
- Terratest documentation: github.com/gruntwork-io/terratest
- OpenTofu: opentofu.org
Sources
- HashiCorp Terraform documentation: modules, backends, remote state
- Gruntwork "Terraform: Up & Running" (Brikman)
- Open Policy Agent: conftest for policy-as-code
- Terraform Sentinel documentation (HCP Terraform)
- AWS Security blog: Terraform security best practices
Enjoyed this post? Follow AmtocSoft for AI tutorials from beginner to professional.
☕ Buy Me a Coffee | 🔔 YouTube | 💼 LinkedIn | 🐦 X/Twitter
Comments
Post a Comment