AmtocSoft Tech Insights: Cloud Cost Optimization in 2026: Right-Sizing, Spot Instances, FinOps, and the Patterns That Cut AWS Bills in Half

Friday, April 17, 2026

Cloud Cost Optimization in 2026: Right-Sizing, Spot Instances, FinOps, and the Patterns That Cut AWS Bills in Half

Introduction

Cloud spending crossed $780 billion globally in 2025 and is projected to exceed $1 trillion by 2028. Every engineering organization running workloads on AWS, GCP, or Azure contributed to that number — and according to Flexera's 2025 State of the Cloud report, 32% of that spend is pure waste. Not inefficiency, not over-provisioning for safety margin. Waste: idle resources, forgotten environments, snapshots nobody will ever restore, and elastic IPs pointing at nothing.

The 32% figure has held steady for four years now. Teams optimize, then sprawl happens again. New services spin up in dev and never get torn down. A data pipeline scales out during a traffic spike and the Auto Scaling Group never scales back in. A contractor leaves and their development environment keeps running at $400/month. This is the cloud cost problem in 2026: not a one-time audit issue, but a continuous operational discipline problem.

There are three distinct categories of cloud cost optimization, and they require different techniques. Waste elimination is the fastest return — you are paying for things you are not using at all. It requires a scanner, a reporting loop, and an owner. Right-sizing is the second layer — you are running the wrong instance type or capacity for your actual workload. It requires CloudWatch data, Compute Optimizer recommendations, and the willingness to actually act on them. Architectural optimization is the deepest layer — you have built something that costs too much by design, and fixing it requires changing infrastructure patterns: VPC endpoints, storage tiering, Lambda vs EC2 tradeoffs, data transfer topology.

Most teams skip directly to Savings Plans and Reserved Instances and miss the other three levers entirely. If your environment is 32% waste, a 3-year Reserved Instance commitment is just locking in a discount on resources you did not need. The right order is: eliminate waste first, then right-size, then commit to Savings Plans on the baseline that remains, then optimize architecture to reduce the unit cost further.

This post walks through all four stages, with production-ready code for each. The numbers throughout are from real accounts: actual savings percentages, actual instance comparison figures, actual data transfer costs. If you follow the pattern end to end, a $50,000/month AWS bill commonly drops to $25,000–$30,000 within 90 days — without sacrificing reliability.

1. Waste Elimination: The Low-Hanging Fruit

Waste is deceptively easy to accumulate and surprisingly hard to see without tooling. The AWS console is not designed to surface idle resources — it is designed to surface usage. You have to go looking for waste deliberately.

The most common waste categories, ranked by typical dollar impact:

Unattached EBS volumes are the single biggest waste item in most accounts. When an EC2 instance is terminated, its attached EBS volumes persist unless DeleteOnTermination was set to true at launch — which is not the default for volumes added after initial instance creation. A production account that has been running for two years will routinely have dozens to hundreds of unattached volumes, often including expensive io2 or gp3 provisioned-IOPS volumes. Monthly cost: $0.08–$0.125/GB/month for gp3, $0.125/GB/month for io2. A 500GB io2 volume sitting unused costs $62.50/month. Multiply by 50 forgotten volumes and you have $3,125/month in pure waste.

Stopped EC2 instances are paying for EBS even when the compute is stopped. A stopped m5.4xlarge with a 200GB root volume and 1TB data volume is costing approximately $100/month in storage while delivering zero value. The usual culprit: a developer stopped the instance "temporarily" six months ago and then left the company. The instance sits in a stopped state indefinitely.

Idle Elastic IPs cost $0.005/hour when not associated with a running instance — $3.60/month each. This sounds trivial but production accounts regularly accumulate 50-100 unassociated EIPs from stack teardowns that missed the cleanup step.

Unused Load Balancers are charged at $0.008/LCU-hour for ALBs plus a fixed hourly fee regardless of traffic. An ALB with no targets or zero traffic costs $16-22/month depending on region.

Old snapshots are charged at $0.05/GB/month and accumulate silently. A team running daily snapshots of a 2TB RDS instance for a year generates 365 snapshots; if only 30-day retention was needed, 335 snapshots at ~$100 each represents $33,500 in unnecessary annual spend.

Forgotten development environments are the most human problem. A CloudFormation stack spun up for a proof of concept, a Kubernetes cluster created for a demo, an RDS instance for a feature that got cancelled — these run for months because nobody remembers to check and there is no automated cleanup.

The tooling layer starts with AWS Trusted Advisor (available on Business/Enterprise support tiers) and AWS Cost Explorer, but both require manual review. AWS Compute Optimizer integrates with Lambda and automatically surfaces right-sizing opportunities. For automated multi-region scanning, you need to write it yourself or use Cloud Custodian (open source) or Infracost in CI.

Here is a production boto3 scanner that identifies all major waste categories across every region in your account and produces a structured report:

#!/usr/bin/env python3
"""
idle-resource-scanner.py

Scans all AWS regions for idle/orphaned resources and produces a
cost-annotated JSON report. Run with: python3 idle-resource-scanner.py

Prerequisites:
    pip install boto3 botocore
    AWS credentials configured with ReadOnlyAccess minimum

Outputs:
    idle_resources_YYYY-MM-DD.json  — structured report
    idle_resources_YYYY-MM-DD.txt   — human-readable summary with estimated monthly cost
"""

import boto3
import json
import datetime
from botocore.exceptions import ClientError
from collections import defaultdict

# Approximate monthly cost estimates per resource type (us-east-1 pricing)
# Adjust for your region if needed
COST_ESTIMATES = {
    "ebs_unattached_gp3": 0.08,      # per GB/month
    "ebs_unattached_io2": 0.125,     # per GB/month
    "ebs_unattached_gp2": 0.10,      # per GB/month
    "elastic_ip_idle": 3.60,         # flat per month
    "alb_idle": 18.00,               # flat per month (approximate)
    "nlb_idle": 16.00,               # flat per month (approximate)
    "stopped_instance_ebs_gp3": 0.08, # per GB/month of attached storage
}


def get_all_regions(session):
    """Return all enabled EC2 regions for this account."""
    ec2 = session.client("ec2", region_name="us-east-1")
    response = ec2.describe_regions(Filters=[{"Name": "opt-in-status", "Values": ["opt-in-not-required", "opted-in"]}])
    return [r["RegionName"] for r in response["Regions"]]


def scan_unattached_ebs(ec2_client, region):
    """Find EBS volumes in 'available' state (not attached to any instance)."""
    findings = []
    paginator = ec2_client.get_paginator("describe_volumes")

    for page in paginator.paginate(Filters=[{"Name": "status", "Values": ["available"]}]):
        for vol in page["Volumes"]:
            vol_type = vol["VolumeType"]
            size_gb = vol["Size"]

            # Calculate monthly cost estimate
            cost_key = f"ebs_unattached_{vol_type}"
            price_per_gb = COST_ESTIMATES.get(cost_key, 0.10)
            monthly_cost = size_gb * price_per_gb

            # Calculate age — older volumes are higher priority for cleanup
            created = vol["CreateTime"]
            age_days = (datetime.datetime.now(datetime.timezone.utc) - created).days

            findings.append({
                "resource_type": "ebs_volume",
                "resource_id": vol["VolumeId"],
                "region": region,
                "size_gb": size_gb,
                "volume_type": vol_type,
                "age_days": age_days,
                "monthly_cost_usd": round(monthly_cost, 2),
                "tags": {t["Key"]: t["Value"] for t in vol.get("Tags", [])},
                "recommendation": "DELETE if not needed — no instance attached",
            })

    return findings


def scan_idle_elastic_ips(ec2_client, region):
    """Find Elastic IPs not associated with any running instance or network interface."""
    findings = []

    response = ec2_client.describe_addresses()
    for addr in response["Addresses"]:
        # An EIP is idle if it has no AssociationId (not attached to anything)
        if "AssociationId" not in addr:
            findings.append({
                "resource_type": "elastic_ip",
                "resource_id": addr["AllocationId"],
                "public_ip": addr["PublicIp"],
                "region": region,
                "monthly_cost_usd": COST_ESTIMATES["elastic_ip_idle"],
                "tags": {t["Key"]: t["Value"] for t in addr.get("Tags", [])},
                "recommendation": "RELEASE if not needed — idle EIP accruing charge",
            })

    return findings


def scan_stopped_instances(ec2_client, region):
    """Find stopped EC2 instances still paying for attached EBS storage."""
    findings = []
    paginator = ec2_client.get_paginator("describe_instances")

    for page in paginator.paginate(Filters=[{"Name": "instance-state-name", "Values": ["stopped"]}]):
        for reservation in page["Reservations"]:
            for instance in reservation["Instances"]:
                instance_id = instance["InstanceId"]
                instance_type = instance["InstanceType"]

                # Sum storage cost across all attached volumes
                total_storage_cost = 0
                total_gb = 0
                for mapping in instance.get("BlockDeviceMappings", []):
                    # Look up the volume to get its size
                    try:
                        vol_response = ec2_client.describe_volumes(
                            VolumeIds=[mapping["Ebs"]["VolumeId"]]
                        )
                        for vol in vol_response["Volumes"]:
                            gb = vol["Size"]
                            vol_type = vol["VolumeType"]
                            price = COST_ESTIMATES.get(f"stopped_instance_ebs_{vol_type}", 0.10)
                            total_storage_cost += gb * price
                            total_gb += gb
                    except ClientError:
                        pass  # Volume may have been deleted; skip

                # Get stop time from state transition reason if available
                state_reason = instance.get("StateTransitionReason", "Unknown")

                name_tag = next(
                    (t["Value"] for t in instance.get("Tags", []) if t["Key"] == "Name"),
                    "unnamed"
                )

                findings.append({
                    "resource_type": "stopped_ec2",
                    "resource_id": instance_id,
                    "instance_type": instance_type,
                    "name": name_tag,
                    "region": region,
                    "total_storage_gb": total_gb,
                    "monthly_storage_cost_usd": round(total_storage_cost, 2),
                    "state_reason": state_reason,
                    "tags": {t["Key"]: t["Value"] for t in instance.get("Tags", [])},
                    "recommendation": "TERMINATE if not needed — paying for EBS while stopped",
                })

    return findings


def scan_idle_load_balancers(elbv2_client, region):
    """Find ALBs and NLBs with no registered targets."""
    findings = []
    paginator = elbv2_client.get_paginator("describe_load_balancers")

    for page in paginator.paginate():
        for lb in page["LoadBalancers"]:
            lb_arn = lb["LoadBalancerArn"]
            lb_type = lb["Type"]  # application | network | gateway

            # Get all target groups for this LB
            tg_paginator = elbv2_client.get_paginator("describe_target_groups")
            total_healthy_targets = 0

            for tg_page in tg_paginator.paginate(LoadBalancerArn=lb_arn):
                for tg in tg_page["TargetGroups"]:
                    try:
                        health = elbv2_client.describe_target_health(
                            TargetGroupArn=tg["TargetGroupArn"]
                        )
                        # Count only healthy targets — unhealthy still means something is registered
                        healthy = [t for t in health["TargetHealthDescriptions"]
                                   if t["TargetHealth"]["State"] == "healthy"]
                        total_healthy_targets += len(healthy)
                    except ClientError:
                        pass

            if total_healthy_targets == 0:
                cost_key = "alb_idle" if lb_type == "application" else "nlb_idle"
                findings.append({
                    "resource_type": f"{lb_type}_load_balancer",
                    "resource_id": lb_arn,
                    "dns_name": lb["DNSName"],
                    "region": region,
                    "healthy_target_count": 0,
                    "monthly_cost_usd": COST_ESTIMATES.get(cost_key, 16.00),
                    "recommendation": "DELETE if not needed — no healthy targets registered",
                })

    return findings


def scan_old_snapshots(ec2_client, region, max_age_days=90):
    """Find EBS snapshots older than max_age_days owned by this account."""
    findings = []
    paginator = ec2_client.get_paginator("describe_snapshots")
    sts = boto3.client("sts")
    account_id = sts.get_caller_identity()["Account"]

    for page in paginator.paginate(OwnerIds=[account_id]):
        for snap in page["Snapshots"]:
            created = snap["StartTime"]
            age_days = (datetime.datetime.now(datetime.timezone.utc) - created).days

            if age_days > max_age_days:
                # Snapshots are billed on changed-block basis; use volume size as upper bound
                size_gb = snap["VolumeSize"]
                monthly_cost = size_gb * 0.05  # $0.05/GB/month

                findings.append({
                    "resource_type": "ebs_snapshot",
                    "resource_id": snap["SnapshotId"],
                    "region": region,
                    "size_gb": size_gb,
                    "age_days": age_days,
                    "description": snap.get("Description", ""),
                    "monthly_cost_usd_upper_bound": round(monthly_cost, 2),
                    "recommendation": f"REVIEW — snapshot is {age_days} days old, consider retention policy",
                })

    return findings


def main():
    session = boto3.Session()
    regions = get_all_regions(session)

    all_findings = defaultdict(list)
    total_monthly_waste = 0.0

    print(f"Scanning {len(regions)} regions for idle resources...\n")

    for region in regions:
        print(f"  Scanning {region}...")
        ec2 = session.client("ec2", region_name=region)
        elbv2 = session.client("elbv2", region_name=region)

        region_findings = []
        region_findings.extend(scan_unattached_ebs(ec2, region))
        region_findings.extend(scan_idle_elastic_ips(ec2, region))
        region_findings.extend(scan_stopped_instances(ec2, region))
        region_findings.extend(scan_idle_load_balancers(elbv2, region))
        region_findings.extend(scan_old_snapshots(ec2, region, max_age_days=90))

        all_findings[region] = region_findings

        region_waste = sum(
            f.get("monthly_cost_usd", f.get("monthly_cost_usd_upper_bound", 0))
            for f in region_findings
        )
        total_monthly_waste += region_waste

        if region_findings:
            print(f"    Found {len(region_findings)} idle resources (~${region_waste:.2f}/month)")

    # Write JSON report
    today = datetime.date.today().isoformat()
    report = {
        "scan_date": today,
        "total_monthly_waste_usd": round(total_monthly_waste, 2),
        "total_annual_waste_usd": round(total_monthly_waste * 12, 2),
        "findings_by_region": dict(all_findings),
    }

    json_path = f"idle_resources_{today}.json"
    with open(json_path, "w") as f:
        json.dump(report, f, indent=2, default=str)

    # Write human-readable summary
    txt_path = f"idle_resources_{today}.txt"
    with open(txt_path, "w") as f:
        f.write(f"AWS Idle Resource Report — {today}\n")
        f.write("=" * 60 + "\n\n")
        f.write(f"Total estimated monthly waste: ${total_monthly_waste:,.2f}\n")
        f.write(f"Total estimated annual waste:  ${total_monthly_waste * 12:,.2f}\n\n")

        for region, findings in all_findings.items():
            if not findings:
                continue
            f.write(f"\n--- {region} ---\n")
            for item in findings:
                cost = item.get("monthly_cost_usd", item.get("monthly_cost_usd_upper_bound", 0))
                f.write(f"  [{item['resource_type']}] {item['resource_id']}  ${cost:.2f}/mo\n")
                f.write(f"    → {item['recommendation']}\n")

    print(f"\nTotal estimated monthly waste: ${total_monthly_waste:,.2f}")
    print(f"Total estimated annual waste:  ${total_monthly_waste * 12:,.2f}")
    print(f"\nReports written to {json_path} and {txt_path}")


if __name__ == "__main__":
    main()

Running this script on a mature AWS account for the first time typically surfaces 10-15% of the monthly bill in remediable waste within the first pass. Set it up as a weekly Lambda execution and pipe the report into Slack or email to create a continuous cleanup loop. Cloud Custodian can go further: it supports auto-remediation policies — tag unattached EBS volumes with a 14-day TTL and delete automatically after the grace period if no owner claims them.

flowchart LR A[boto3 Scanner
Weekly Lambda] --> B{Resource State} B -->|Available| C[Unattached EBS
Idle EIPs] B -->|Stopped| D[Stopped EC2
+ EBS Storage] B -->|No Targets| E[Idle Load
Balancers] B -->|Age > 90d| F[Old Snapshots] C --> G[Classify &
Cost-Annotate] D --> G E --> G F --> G G --> H{Owner Tag
Present?} H -->|Yes| I[Notify Owner
via Slack/Email] H -->|No| J[Notify Cloud
Platform Team] I --> K{Response
Within 14d?} J --> K K -->|Claimed| L[Tag as
Active] K -->|No Response| M[Auto-Remediate
via Cloud Custodian] M --> N[Cost Savings
Report]

Typical savings from waste elimination alone: 10-15% of total monthly bill, realized within the first 30 days.

2. Right-Sizing EC2 and RDS

After eliminating waste, the next layer is running the right-sized resource for your actual workload. This is the area where most teams leave the most money on the table: they provision for peak throughput, measure average utilization, and conclude the resource is fine. The 80th-percentile rule fixes this — size your instances for your 80th-percentile load, not your average. P99 spikes are handled by Auto Scaling, not by over-provisioning the base.

AWS Compute Optimizer uses 14 days of CloudWatch metrics (extensible to 93 days on the enhanced tier) to generate right-sizing recommendations across EC2, ECS, Lambda, and EBS. It accounts for memory utilization on EC2 if you have the CloudWatch agent installed, which is critical — CPU-based recommendations alone miss memory-bound workloads entirely. Enable Compute Optimizer at the Organization level to get cross-account recommendations.

The instance family matching matters as much as the size:

C-family (Compute Optimized): c6i, c6g, c7g — for CPU-bound workloads: web servers, API gateways, encoding, scientific computing
R-family (Memory Optimized): r6i, r6g, r7g — for memory-bound workloads: in-memory caches, analytics, large JVM heaps, databases loaded into memory
M-family (General Purpose): m6i, m6g, m7g — balanced; good default for mixed workloads
I-family (I/O Optimized): i4i, i3en — for high-throughput sequential reads/writes: Elasticsearch, Kafka brokers, Cassandra
G/P-family (GPU): g5, p4d — ML inference/training; replace with SageMaker Inference for smaller workloads

Graviton3 (ARM) by default. Use Graviton3 (c7g, m7g, r7g, t4g) unless you have a specific reason not to — a dependency on x86-only binaries, or a workload that has not been tested on ARM. The performance per dollar improvement is 20-40% versus equivalent x86 instances. Java, Go, Python, Node.js, and Ruby all run on ARM without code changes. Most Docker images support linux/arm64. The migration barrier is lower than most teams expect.

For RDS, right-sizing requires distinguishing the bottleneck:
- High CPUUtilization → underpowered CPU, consider scale-up or query optimization first
- High FreeableMemory consumption (low remaining free memory) → memory-bound, move to r-family
- High ReadIOPS/WriteIOPS against provisioned IOPS limit → I/O bound, increase IOPS or move to Aurora

Here is a CloudWatch-based right-sizing reporter that generates a CSV of current vs. recommended instance sizes for all EC2 instances in a region:

#!/usr/bin/env python3
"""
rightsizing-reporter.py

Pulls 14-day CloudWatch metrics for all running EC2 instances in a region
and generates a right-sizing report with cost-saving recommendations.

Usage:
    python3 rightsizing-reporter.py --region us-east-1

Prerequisites:
    pip install boto3 botocore
    CloudWatch agent installed on instances for memory metrics (optional but recommended)
"""

import argparse
import boto3
import csv
import datetime
import sys
from collections import defaultdict

# Instance family to workload type mapping
# Used to suggest alternative families if the workload profile doesn't match
FAMILY_WORKLOAD_MAP = {
    "t": "burstable",
    "m": "general",
    "c": "compute",
    "r": "memory",
    "i": "storage",
    "g": "gpu",
    "p": "gpu",
}

# Approximate on-demand hourly costs for common instance types (us-east-1)
# Source: AWS pricing API — update periodically or call pricing API dynamically
INSTANCE_COSTS = {
    "t3.micro": 0.0104, "t3.small": 0.0208, "t3.medium": 0.0416,
    "t3.large": 0.0832, "t3.xlarge": 0.1664, "t3.2xlarge": 0.3328,
    "t4g.micro": 0.0084, "t4g.small": 0.0168, "t4g.medium": 0.0336,
    "t4g.large": 0.0672, "t4g.xlarge": 0.1344, "t4g.2xlarge": 0.2688,
    "m5.large": 0.096, "m5.xlarge": 0.192, "m5.2xlarge": 0.384,
    "m6g.large": 0.077, "m6g.xlarge": 0.154, "m6g.2xlarge": 0.308,
    "c5.large": 0.085, "c5.xlarge": 0.170, "c5.2xlarge": 0.340,
    "c6g.large": 0.068, "c6g.xlarge": 0.136, "c6g.2xlarge": 0.272,
    "r5.large": 0.126, "r5.xlarge": 0.252, "r5.2xlarge": 0.504,
    "r6g.large": 0.1008, "r6g.xlarge": 0.2016, "r6g.2xlarge": 0.4032,
}


def get_cloudwatch_stat(cw_client, instance_id, metric_name, namespace,
                        stat, days=14, dimensions=None):
    """
    Fetch the p80 (or specified stat) of a CloudWatch metric over the last N days.
    Returns None if no data is available.
    """
    if dimensions is None:
        dimensions = [{"Name": "InstanceId", "Value": instance_id}]

    end_time = datetime.datetime.utcnow()
    start_time = end_time - datetime.timedelta(days=days)

    response = cw_client.get_metric_statistics(
        Namespace=namespace,
        MetricName=metric_name,
        Dimensions=dimensions,
        StartTime=start_time,
        EndTime=end_time,
        Period=3600,  # hourly granularity
        Statistics=[stat],
        ExtendedStatistics=["p80"] if stat == "p80" else [],
    )

    datapoints = response.get("Datapoints", [])
    if not datapoints:
        return None

    if stat in ("Average", "Maximum", "Minimum", "Sum"):
        values = [d[stat] for d in datapoints]
    else:
        # Extended statistic
        values = [d["ExtendedStatistics"].get(stat, 0) for d in datapoints]

    return sum(values) / len(values) if values else None


def analyze_instance(ec2_client, cw_client, instance):
    """
    Analyze a single EC2 instance and return a right-sizing assessment.
    """
    instance_id = instance["InstanceId"]
    instance_type = instance["InstanceType"]

    name = next(
        (t["Value"] for t in instance.get("Tags", []) if t["Key"] == "Name"),
        "unnamed"
    )

    # Fetch CPU metrics — p80 is the key percentile for right-sizing
    cpu_p80 = get_cloudwatch_stat(
        cw_client, instance_id, "CPUUtilization", "AWS/EC2", "Average"
    )
    cpu_max = get_cloudwatch_stat(
        cw_client, instance_id, "CPUUtilization", "AWS/EC2", "Maximum"
    )

    # Network I/O
    net_in = get_cloudwatch_stat(
        cw_client, instance_id, "NetworkIn", "AWS/EC2", "Average"
    )
    net_out = get_cloudwatch_stat(
        cw_client, instance_id, "NetworkOut", "AWS/EC2", "Average"
    )

    # Memory from CloudWatch agent (CWAgent namespace) — only available if agent installed
    mem_used_pct = get_cloudwatch_stat(
        cw_client, instance_id, "mem_used_percent", "CWAgent", "Average"
    )

    # Determine workload profile from metrics
    recommendation = "no-change"
    reason = "insufficient data"
    suggested_type = instance_type

    if cpu_p80 is not None:
        current_family = instance_type.split(".")[0]  # e.g., "m5" → "m5"
        base_family = "".join(c for c in current_family if not c.isdigit())  # "m5" → "m"

        if cpu_p80 < 10 and (mem_used_pct is None or mem_used_pct < 30):
            # Significantly under-utilized: recommend downsizing
            recommendation = "downsize"
            reason = f"CPU p80={cpu_p80:.1f}%, memory={mem_used_pct:.1f}% — significantly under-utilized"
            # Suggest one size down in same family, prefer Graviton
            suggested_type = f"{current_family}-downsize-to-smaller"

        elif cpu_p80 > 70:
            recommendation = "review-scale-out"
            reason = f"CPU p80={cpu_p80:.1f}% — consistently high, consider scale-out or larger instance"

        # Graviton migration opportunity
        if not current_family.endswith("g") and not current_family.startswith(("g", "p")):
            # Not already on Graviton and not GPU instance
            graviton_map = {
                "t3": "t4g", "m5": "m6g", "m6i": "m7g",
                "c5": "c6g", "c6i": "c7g", "r5": "r6g", "r6i": "r7g"
            }
            graviton_family = graviton_map.get(current_family)
            if graviton_family:
                size = instance_type.split(".")[1]
                suggested_type = f"{graviton_family}.{size}"
                recommendation = "migrate-to-graviton"
                reason = f"Same size on Graviton3 is ~20-40% cheaper with equivalent performance"

    # Calculate savings estimate
    current_hourly = INSTANCE_COSTS.get(instance_type, 0)
    suggested_hourly = INSTANCE_COSTS.get(suggested_type, current_hourly)
    monthly_savings = (current_hourly - suggested_hourly) * 730  # 730 hours/month

    return {
        "instance_id": instance_id,
        "name": name,
        "current_type": instance_type,
        "suggested_type": suggested_type,
        "recommendation": recommendation,
        "reason": reason,
        "cpu_avg_pct": round(cpu_p80, 1) if cpu_p80 else "N/A",
        "cpu_max_pct": round(cpu_max, 1) if cpu_max else "N/A",
        "mem_avg_pct": round(mem_used_pct, 1) if mem_used_pct else "N/A (agent not installed)",
        "current_hourly_usd": current_hourly,
        "suggested_hourly_usd": suggested_hourly,
        "estimated_monthly_savings_usd": round(monthly_savings, 2),
    }


def main():
    parser = argparse.ArgumentParser(description="EC2 right-sizing reporter")
    parser.add_argument("--region", required=True, help="AWS region to scan")
    args = parser.parse_args()

    session = boto3.Session()
    ec2 = session.client("ec2", region_name=args.region)
    cw = session.client("cloudwatch", region_name=args.region)

    # Get all running instances
    paginator = ec2.get_paginator("describe_instances")
    instances = []
    for page in paginator.paginate(
        Filters=[{"Name": "instance-state-name", "Values": ["running"]}]
    ):
        for reservation in page["Reservations"]:
            instances.extend(reservation["Instances"])

    print(f"Analyzing {len(instances)} running instances in {args.region}...")

    results = []
    for i, instance in enumerate(instances):
        print(f"  [{i+1}/{len(instances)}] {instance['InstanceId']} ({instance['InstanceType']})")
        result = analyze_instance(ec2, cw, instance)
        results.append(result)

    # Sort by potential savings descending
    results.sort(key=lambda x: x["estimated_monthly_savings_usd"], reverse=True)

    total_savings = sum(r["estimated_monthly_savings_usd"] for r in results)

    # Write CSV report
    today = datetime.date.today().isoformat()
    filename = f"rightsizing_report_{args.region}_{today}.csv"

    fieldnames = list(results[0].keys()) if results else []
    with open(filename, "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(results)

    print(f"\nTotal estimated monthly savings if all recommendations applied: ${total_savings:,.2f}")
    print(f"Report written to {filename}")

    # Print top 10 opportunities
    print("\nTop 10 right-sizing opportunities:")
    for r in results[:10]:
        print(f"  {r['instance_id']} ({r['current_type']} → {r['suggested_type']}): "
              f"${r['estimated_monthly_savings_usd']:.2f}/mo — {r['reason']}")


if __name__ == "__main__":
    main()

Real-world case study: migrating a fleet of t3.xlarge general-purpose application servers to t4g.medium (Graviton2) after verifying that CPU utilization averaged 12% and memory stayed under 2GB. The t3.xlarge costs $0.1664/hour ($121/month). The t4g.medium costs $0.0336/hour ($24.50/month). Same application performance under load testing. Savings: 80% reduction on that fleet. Even the more conservative migration from m5.xlarge to m6g.xlarge (same family, different generation, Graviton3) saves approximately 20% with zero code or configuration changes.

flowchart TD A[EC2 Instance
Needing Right-Sizing] --> B{CPU p80 > 70%?} B -->|Yes| C{Memory p80 > 70%?} B -->|No| D{CPU p80 < 15%?} C -->|Yes| E[Scale Out
Add More Instances
Same Family] C -->|No| F[C-Family
Compute Optimized
c7g, c6g, c6i] D -->|Yes| G{Memory p80 > 60%?} D -->|No| H[Current Size
Appropriate — Review
Graviton Migration] G -->|Yes| I[Downsize CPU
Keep Memory
→ R-Family] G -->|No| J[Downsize Both
→ Smaller Instance
Same Family] H --> K{Currently on
x86?} K -->|Yes| L[Migrate to Graviton3
t4g / m7g / c7g / r7g
20-40% savings] K -->|No, ARM| M[Already Optimized
Monitor Quarterly] I --> N[r7g / r6g
Memory Optimized
ARM] J --> O[t4g / m7g
General Purpose
ARM]

3. Spot and Preemptible Instances

Spot instances are the single highest-leverage cost reduction lever available in AWS. Up to 90% off On-Demand pricing, for the same underlying hardware. The tradeoff is interruption: AWS can reclaim a Spot instance with a two-minute warning when capacity is needed elsewhere. Teams that have not built interruption-aware workloads treat Spot as too risky. Teams that have built properly — and the engineering investment is smaller than most expect — treat On-Demand as a waste of money for any non-production or stateless workload.

The interruption model is more predictable than it sounds. AWS publishes interruption frequency data per instance type and availability zone. Many instance types in us-east-1 have interruption rates below 5% in any given month. Diversifying across multiple instance types (e.g., requesting m5.xlarge, m5a.xlarge, m5n.xlarge, m6i.xlarge in the same launch template) and multiple AZs dramatically reduces the probability that all your capacity gets interrupted simultaneously. EC2 Spot Fleet and EC2 Auto Scaling Groups with mixed capacity pools handle this automatically.

Workloads suited for Spot:
- Batch data processing pipelines (Spark, Flink, AWS Batch)
- CI/CD runners (GitHub Actions self-hosted, GitLab runners)
- Stateless web tier behind a load balancer (any instance is replaceable)
- Machine learning training jobs (most frameworks support checkpointing)
- Video transcoding and media processing
- Web scraping and parallel crawlers
- Load testing infrastructure

Workloads NOT suited for Spot without additional engineering:
- Databases with local state (without replication/failover)
- Single-instance applications with no redundancy
- Jobs that cannot be interrupted mid-execution without checkpointing

The recommended architecture is a mixed On-Demand/Spot Auto Scaling Group: base capacity of 20-40% On-Demand (sized for minimum viable traffic), with Spot filling burst capacity. During Spot interruption, the ASG automatically replaces from the next cheapest capacity pool. The On-Demand floor means the service stays up even during a regional Spot capacity crunch.

The interruption handler is the critical piece. When AWS sends the two-minute Spot interruption notice (via EC2 instance metadata at http://169.254.169.254/latest/meta-data/spot/termination-time), the instance needs to drain connections gracefully — deregister from the load balancer target group, finish in-flight requests, and requeue any incomplete work to SQS.

Here is the complete Terraform configuration for a mixed On-Demand/Spot ASG with an interruption handler Lambda:

# mixed-spot-asg.tf
#
# Creates a mixed On-Demand/Spot Auto Scaling Group with:
# - 30% On-Demand base, 70% Spot burst
# - Diversified across 4 instance types and 3 AZs
# - EventBridge rule to trigger Lambda on Spot interruption notices
# - Lambda that deregisters the instance from its target group and
#   sends an SNS alert before the instance is reclaimed

terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
}

variable "app_name" {
  description = "Application name for resource tagging"
  type        = string
  default     = "my-app"
}

variable "vpc_id" {
  description = "VPC ID for the ASG"
  type        = string
}

variable "private_subnet_ids" {
  description = "List of private subnet IDs across AZs"
  type        = list(string)
}

variable "target_group_arns" {
  description = "ALB target group ARNs to register instances with"
  type        = list(string)
}

variable "ami_id" {
  description = "AMI ID for the launch template (use Amazon Linux 2023 or your custom AMI)"
  type        = string
}

variable "key_pair_name" {
  description = "EC2 key pair name for SSH access"
  type        = string
}

# ─── Launch Template ────────────────────────────────────────────────────────

resource "aws_launch_template" "app" {
  name_prefix   = "${var.app_name}-"
  image_id      = var.ami_id
  instance_type = "m6g.large"  # Default; overridden by ASG mixed instance policy

  key_name = var.key_pair_name

  # Use IMDSv2 — required for Spot termination notice polling
  metadata_options {
    http_endpoint               = "enabled"
    http_tokens                 = "required"  # IMDSv2 only
    http_put_response_hop_limit = 1
  }

  # User data: install CloudWatch agent + Spot termination poll script
  user_data = base64encode(<<-EOF
    #!/bin/bash
    # Install and start the application (replace with your actual startup)
    yum update -y
    yum install -y amazon-cloudwatch-agent

    # Spot termination poller — polls IMDS every 5s
    # On termination notice: drain connections and deregister from target group
    cat > /usr/local/bin/spot-termination-handler.sh << 'SCRIPT'
    #!/bin/bash
    INSTANCE_ID=$(curl -sf -H "X-aws-ec2-metadata-token: $(curl -sf \
      -X PUT "http://169.254.169.254/latest/api/token" \
      -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")" \
      http://169.254.169.254/latest/meta-data/instance-id)

    while true; do
      TOKEN=$(curl -sf -X PUT "http://169.254.169.254/latest/api/token" \
        -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")
      TERMINATION=$(curl -sf -H "X-aws-ec2-metadata-token: $TOKEN" \
        http://169.254.169.254/latest/meta-data/spot/termination-time)

      if [ "$TERMINATION" != "" ]; then
        echo "Spot termination notice received at $TERMINATION for $INSTANCE_ID"
        # Allow existing requests to drain (connection draining is also set on the TG)
        sleep 30
        # Signal the systemd service to stop accepting new connections
        systemctl stop my-app.service
        break
      fi
      sleep 5
    done
    SCRIPT

    chmod +x /usr/local/bin/spot-termination-handler.sh

    # Run the termination handler as a background systemd service
    cat > /etc/systemd/system/spot-termination-handler.service << 'SERVICE'
    [Unit]
    Description=Spot Instance Termination Handler
    After=network.target

    [Service]
    Type=simple
    ExecStart=/usr/local/bin/spot-termination-handler.sh
    Restart=on-failure

    [Install]
    WantedBy=multi-user.target
    SERVICE

    systemctl enable spot-termination-handler
    systemctl start spot-termination-handler

    # Start the application
    systemctl start my-app.service
  EOF
  )

  tag_specifications {
    resource_type = "instance"
    tags = {
      Name        = var.app_name
      ManagedBy   = "terraform"
      Environment = "production"
    }
  }

  lifecycle {
    create_before_destroy = true
  }
}

# ─── Mixed On-Demand/Spot Auto Scaling Group ────────────────────────────────

resource "aws_autoscaling_group" "app" {
  name                = "${var.app_name}-asg"
  vpc_zone_identifier = var.private_subnet_ids
  target_group_arns   = var.target_group_arns

  min_size         = 2
  max_size         = 20
  desired_capacity = 4

  # Connection draining — wait 60s for in-flight requests before deregistering
  default_instance_warmup = 60

  mixed_instances_policy {
    instances_distribution {
      # 30% On-Demand base, 70% Spot for cost efficiency
      on_demand_base_capacity                  = 1  # Always keep at least 1 On-Demand
      on_demand_percentage_above_base_capacity = 30
      spot_allocation_strategy                 = "price-capacity-optimized"
      # price-capacity-optimized: picks pool with best combination of low price
      # and available capacity — reduces interruption risk vs pure lowest-price
    }

    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.app.id
        version            = "$Latest"
      }

      # Override instance types — diversify to reduce interruption probability
      # All are Graviton3 equivalents for price/performance
      override {
        instance_type     = "m6g.large"
        weighted_capacity = "1"
      }
      override {
        instance_type     = "m6g.xlarge"
        weighted_capacity = "2"
      }
      override {
        instance_type     = "m7g.large"
        weighted_capacity = "1"
      }
      override {
        instance_type     = "c6g.xlarge"
        weighted_capacity = "2"
      }
    }
  }

  # Auto-replace unhealthy instances
  health_check_type         = "ELB"
  health_check_grace_period = 120

  tag {
    key                 = "Name"
    value               = var.app_name
    propagate_at_launch = true
  }

  tag {
    key                 = "SpotOptimized"
    value               = "true"
    propagate_at_launch = true
  }
}

# ─── EventBridge Rule: Spot Interruption → Lambda ───────────────────────────

resource "aws_cloudwatch_event_rule" "spot_interruption" {
  name        = "${var.app_name}-spot-interruption"
  description = "Trigger Lambda when a Spot instance in our ASG gets interrupted"

  event_pattern = jsonencode({
    source      = ["aws.ec2"]
    detail-type = ["EC2 Spot Instance Interruption Warning"]
  })
}

resource "aws_cloudwatch_event_target" "spot_interruption_lambda" {
  rule      = aws_cloudwatch_event_rule.spot_interruption.name
  target_id = "SpotInterruptionHandler"
  arn       = aws_lambda_function.spot_handler.arn
}

resource "aws_lambda_permission" "allow_eventbridge" {
  statement_id  = "AllowEventBridgeInvoke"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.spot_handler.function_name
  principal     = "events.amazonaws.com"
  source_arn    = aws_cloudwatch_event_rule.spot_interruption.arn
}

# Lambda function code — inline for simplicity; use S3 for production
data "archive_file" "spot_handler" {
  type        = "zip"
  output_path = "/tmp/spot_handler.zip"

  source {
    content  = <<-PYTHON
import boto3
import json
import os

def handler(event, context):
    """
    Triggered by EventBridge when a Spot instance receives a termination warning.
    Deregisters the instance from all target groups in its ASG.
    """
    instance_id = event["detail"]["instance-id"]
    region = event["region"]

    ec2 = boto3.client("ec2", region_name=region)
    elbv2 = boto3.client("elbv2", region_name=region)
    autoscaling = boto3.client("autoscaling", region_name=region)

    print(f"Spot interruption warning for {instance_id}")

    # Find the ASG this instance belongs to
    response = autoscaling.describe_auto_scaling_instances(
        InstanceIds=[instance_id]
    )
    instances = response.get("AutoScalingInstances", [])
    if not instances:
        print(f"Instance {instance_id} not found in any ASG")
        return

    asg_name = instances[0]["AutoScalingGroupName"]
    print(f"Instance belongs to ASG: {asg_name}")

    # Set instance to Standby so ASG replaces it immediately
    # This triggers the ASG to launch a replacement before the instance is terminated
    try:
        autoscaling.enter_standby(
            InstanceIds=[instance_id],
            AutoScalingGroupName=asg_name,
            ShouldDecrementDesiredCapacity=False,  # Keep desired count — trigger replacement
        )
        print(f"Instance {instance_id} entered Standby — replacement launching")
    except Exception as e:
        print(f"Could not enter Standby: {e}")

    return {"status": "handled", "instance_id": instance_id, "asg": asg_name}
    PYTHON
    filename = "handler.py"
  }
}

resource "aws_iam_role" "spot_handler_lambda" {
  name = "${var.app_name}-spot-handler-role"

  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [{
      Effect    = "Allow"
      Principal = { Service = "lambda.amazonaws.com" }
      Action    = "sts:AssumeRole"
    }]
  })
}

resource "aws_iam_role_policy" "spot_handler_lambda" {
  name = "spot-handler-policy"
  role = aws_iam_role.spot_handler_lambda.id

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect = "Allow"
        Action = [
          "autoscaling:DescribeAutoScalingInstances",
          "autoscaling:EnterStandby",
          "elasticloadbalancing:DeregisterTargets",
          "elasticloadbalancing:DescribeTargetGroups",
          "elasticloadbalancing:DescribeTargetHealth",
          "ec2:DescribeInstances",
          "logs:CreateLogGroup",
          "logs:CreateLogStream",
          "logs:PutLogEvents",
        ]
        Resource = "*"
      }
    ]
  })
}

resource "aws_lambda_function" "spot_handler" {
  filename         = data.archive_file.spot_handler.output_path
  function_name    = "${var.app_name}-spot-interruption-handler"
  role             = aws_iam_role.spot_handler_lambda.arn
  handler          = "handler.handler"
  runtime          = "python3.12"
  source_code_hash = data.archive_file.spot_handler.output_base64sha256
  timeout          = 30

  environment {
    variables = {
      APP_NAME = var.app_name
    }
  }
}

flowchart TB subgraph ASG["Mixed Capacity ASG"] direction LR OD1["On-Demand
m7g.large
Base: 30%"] OD2["On-Demand
m7g.large
Always-on floor"] S1["Spot Pool 1
m6g.large
us-east-1a"] S2["Spot Pool 2
m6g.xlarge
us-east-1b"] S3["Spot Pool 3
c6g.xlarge
us-east-1c"] end ALB["Application
Load Balancer"] --> OD1 ALB --> OD2 ALB --> S1 ALB --> S2 ALB --> S3 EB["EventBridge
Spot Interruption
Warning"] --> L["Lambda
Interruption Handler"] S1 -.->|"2-min notice"| EB S2 -.->|"2-min notice"| EB S3 -.->|"2-min notice"| EB L -->|"Enter Standby"| ASG L -->|"Launch Replacement"| NI["New Spot Instance
Next Cheapest Pool"] ASG -->|"Drain Connections
60s"| DRAIN["Graceful Drain
→ Deregister"]

Savings achieved with this pattern for a typical stateless web tier running 10 instances: switching from all On-Demand to 30/70 mixed reduces compute cost by 55-65%. For pure batch workloads running entirely on Spot, the reduction is 70-80% versus On-Demand.

4. Savings Plans and Reserved Instances

Once waste is eliminated and instances are right-sized, you have a stable baseline of compute that runs continuously. That baseline is the target for commitment-based discounts. The mistake teams make is committing before right-sizing — they lock in a 3-year commitment on m5.4xlarge instances, then realize six months later that Graviton3 m7g.2xlarge would be 40% cheaper for the same workload. The commitment is stranded.

The options, ranked by flexibility (most to least) and discount (least to most):

Purchase Type	Discount vs On-Demand	Flexibility	Commitment
On-Demand	0%	Unlimited	None
Compute Savings Plan	17-66%	EC2, Fargate, Lambda; any region, family, OS	1 or 3 year
EC2 Instance Savings Plan	20-72%	Specific family + region; any size/OS	1 or 3 year
Standard Reserved Instance	30-75%	Specific instance type + AZ	1 or 3 year
Convertible Reserved Instance	20-66%	Exchangeable for equal or greater value	1 or 3 year
Spot Instance	60-90%	Interruptible	None (market)

Use Compute Savings Plans as your default. They apply automatically to any EC2 instance, Fargate task, or Lambda invocation in any region, regardless of instance family or operating system. If you migrate from c5.2xlarge to c7g.xlarge six months after purchasing the Savings Plan, the discount automatically applies to the new instance. EC2 Instance Savings Plans give a slightly higher discount (typically 5-10% more) but require commitment to a specific instance family and region — the loss of flexibility is rarely worth it unless the workload is extremely stable.

The 1-year vs 3-year question. The discount gap between 1-year and 3-year is typically 20-25 percentage points. A 1-year Compute Savings Plan gives roughly 35-40% off On-Demand; a 3-year gives roughly 55-60% off. The 3-year only makes financial sense if you are confident the committed spend level will remain stable for 36 months. For most teams, 1-year is the right choice — enough time horizon to capture meaningful savings, short enough that you can adjust when your architecture evolves.

Target coverage ratio. The recommended approach: aim for 70% of your consistent monthly On-Demand compute spend covered by Savings Plans, with 30% remaining On-Demand for burst capacity and flexibility. The 70% is your predictable baseline — the minimum compute you run in any given week. The 30% On-Demand cushion means you can handle traffic spikes, run experiments, and accommodate growth without wasted Savings Plan commit.

AWS Cost Explorer's Savings Plans Recommendations feature calculates your optimal purchase amount automatically. Navigate to Cost Explorer → Savings Plans → Recommendations, set your lookback period to 30 days and target coverage to 70%, and it will tell you exactly which plan type and commitment level to purchase for maximum ROI. The recommendations update weekly as your usage patterns change.

Standard vs Convertible Reserved Instances have largely been superseded by Savings Plans for most workloads. The only scenario where RIs still win: RDS databases (Savings Plans do not cover RDS), and legacy workloads where you have an absolute guarantee of running a specific instance type in a specific AZ for three years. For anything else, Savings Plans are more flexible with comparable savings.

Structuring the full cost profile: after running the full optimization stack, a typical account looks like this — 15% waste elimination (resources that should not exist at all), 25% savings from right-sizing and Graviton migration, 30% covered by Savings Plans on the optimized baseline, and the remaining 30% On-Demand for dynamic capacity. The effective cost is 40-50% of what you started with.

5. Architectural Cost Patterns

After eliminating waste and optimizing compute, the next layer of savings comes from architectural decisions that determine unit costs independent of how well-right-sized your instances are. These are often the most expensive line items on AWS bills because they scale with usage and are invisible until you look at the billing breakdown carefully.

Data transfer costs are the hidden bill. AWS charges for data leaving EC2 to the internet ($0.09/GB in us-east-1), data crossing Availability Zone boundaries ($0.01/GB per direction), and data leaving to other AWS regions ($0.02/GB). Cross-AZ traffic is the subtlest cost: a microservices architecture where each service calls three others, deployed across three AZs, can generate enormous cross-AZ data transfer bills. The fix is co-locating services that communicate heavily in the same AZ, using ALB with AZ affinity routing, or moving to a shared-nothing architecture per AZ. GCP charges no cross-zone data transfer fees within a region, which is a meaningful architectural advantage for densely interconnected systems.

NAT Gateway is frequently the largest surprise item. NAT Gateways charge $0.045/GB of data processed in addition to the hourly fee ($0.045/hour). A data pipeline moving 10TB/day through a NAT Gateway costs $450/day in data processing fees — $13,500/month — on top of the compute cost. The fix for S3 and DynamoDB traffic is VPC Gateway Endpoints, which route traffic through the AWS backbone without leaving the VPC and without NAT Gateway charges. For ECR image pulls, use Interface Endpoints. For everything else, evaluate whether the traffic actually needs NAT or whether the service can be redesigned to use internal VPC endpoints.

S3 storage tiers save 50-90% on cold data. S3 Standard costs $0.023/GB/month. S3 Standard-IA (Infrequent Access) costs $0.0125/GB/month for data accessed less than once a month. S3 Glacier Instant Retrieval costs $0.004/GB/month for archives accessed quarterly. S3 Glacier Deep Archive costs $0.00099/GB/month — 23× cheaper than S3 Standard — for data that might never be accessed again (compliance archives, old backups). Use S3 Intelligent-Tiering for objects where access patterns are uncertain; it automatically moves data between tiers based on access frequency with no retrieval fees.

Lambda vs EC2 breakeven. Lambda costs $0.0000166667/GB-second plus $0.20/million requests. For a function using 512MB of memory, that is $0.0000083333/second of execution. If your function runs for 100ms on average and receives 1 million requests/day, the monthly Lambda cost is $0.0083333/second × 0.1 seconds × 30M monthly requests = $25/month. The equivalent always-on EC2 instance handling 1M requests/day at 100ms each would be a single t4g.small at $7.50/month — but it would be idle 99.7% of the time. Lambda wins for sporadic or spiky workloads; EC2 wins when utilization is consistently above 20-30%.

Here is the Terraform configuration for VPC endpoints for S3 and DynamoDB — the single highest-ROI architectural change for accounts with significant data processing workloads:

# vpc-endpoints.tf
#
# Creates Gateway VPC Endpoints for S3 and DynamoDB.
# These route all traffic to S3 and DynamoDB through the AWS backbone,
# bypassing NAT Gateway entirely — eliminates NAT Gateway data processing charges
# for S3 and DynamoDB traffic, which is often the largest unexpected bill item.
#
# Prerequisites:
# - Existing VPC with route tables (one per AZ is typical)
# - No code changes required in applications — the endpoint is transparent

variable "vpc_id" {
  description = "VPC ID to create endpoints in"
  type        = string
}

variable "route_table_ids" {
  description = "List of route table IDs to associate with the endpoint (typically one per AZ)"
  type        = list(string)
}

variable "aws_region" {
  description = "AWS region (e.g., us-east-1)"
  type        = string
  default     = "us-east-1"
}

# ─── S3 Gateway Endpoint ─────────────────────────────────────────────────────
# Gateway endpoints are FREE — no hourly charge, no data processing charge.
# Traffic to S3 is routed through the endpoint automatically for all buckets.

resource "aws_vpc_endpoint" "s3" {
  vpc_id            = var.vpc_id
  service_name      = "com.amazonaws.${var.aws_region}.s3"
  vpc_endpoint_type = "Gateway"

  # Associate with all route tables — this adds a prefix list route automatically
  route_table_ids = var.route_table_ids

  # Endpoint policy — restrict to only your account's S3 buckets for security
  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect    = "Allow"
        Principal = "*"
        Action    = ["s3:*"]
        Resource  = "*"
        # To restrict to specific buckets, replace Resource with:
        # ["arn:aws:s3:::your-bucket-name", "arn:aws:s3:::your-bucket-name/*"]
      }
    ]
  })

  tags = {
    Name      = "s3-gateway-endpoint"
    ManagedBy = "terraform"
    # Cost savings: eliminates NAT Gateway charges for all S3 traffic
    CostNote  = "Eliminates NAT data processing at $0.045/GB for S3 traffic"
  }
}

# ─── DynamoDB Gateway Endpoint ───────────────────────────────────────────────
# Same as S3 endpoint — free, transparent, no code changes required.

resource "aws_vpc_endpoint" "dynamodb" {
  vpc_id            = var.vpc_id
  service_name      = "com.amazonaws.${var.aws_region}.dynamodb"
  vpc_endpoint_type = "Gateway"

  route_table_ids = var.route_table_ids

  policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Effect    = "Allow"
        Principal = "*"
        Action    = ["dynamodb:*"]
        Resource  = "*"
      }
    ]
  })

  tags = {
    Name      = "dynamodb-gateway-endpoint"
    ManagedBy = "terraform"
    CostNote  = "Eliminates NAT data processing at $0.045/GB for DynamoDB traffic"
  }
}

# ─── ECR Interface Endpoints (for ECS/EKS pulling container images) ──────────
# These ARE charged ($0.01/hour/AZ) but typically save money if you pull
# large images frequently through a NAT Gateway.
# Only create if you have significant ECR image pull traffic.

resource "aws_vpc_endpoint" "ecr_api" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.${var.aws_region}.ecr.api"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = {
    Name = "ecr-api-interface-endpoint"
  }
}

resource "aws_vpc_endpoint" "ecr_dkr" {
  vpc_id              = var.vpc_id
  service_name        = "com.amazonaws.${var.aws_region}.ecr.dkr"
  vpc_endpoint_type   = "Interface"
  subnet_ids          = var.private_subnet_ids
  security_group_ids  = [aws_security_group.vpc_endpoints.id]
  private_dns_enabled = true

  tags = {
    Name = "ecr-dkr-interface-endpoint"
  }
}

# Security group for Interface endpoints — allow HTTPS from within the VPC
resource "aws_security_group" "vpc_endpoints" {
  name        = "vpc-endpoints-sg"
  description = "Security group for VPC Interface Endpoints"
  vpc_id      = var.vpc_id

  ingress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = [data.aws_vpc.selected.cidr_block]
    description = "Allow HTTPS from within VPC"
  }

  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
}

data "aws_vpc" "selected" {
  id = var.vpc_id
}

# ─── Outputs for verification ─────────────────────────────────────────────────

output "s3_endpoint_id" {
  description = "S3 Gateway Endpoint ID"
  value       = aws_vpc_endpoint.s3.id
}

output "dynamodb_endpoint_id" {
  description = "DynamoDB Gateway Endpoint ID"
  value       = aws_vpc_endpoint.dynamodb.id
}

output "cost_savings_note" {
  description = "Estimated monthly savings from these endpoints"
  value       = "S3 + DynamoDB endpoints eliminate NAT Gateway data charges. At $0.045/GB, 10TB/month through NAT = $460/month saved."
}

Adding S3 and DynamoDB VPC Gateway Endpoints to an account processing 10TB/month through a NAT Gateway saves approximately $460/month — and the endpoints are free. This is the definition of a no-downside optimization.

CloudFront as a cost reducer. High cache hit rates on CloudFront mean that traffic never reaches your origin servers, reducing EC2 load and reducing the data transfer charge from EC2 to the internet ($0.09/GB) to CloudFront's edge serving cost (first 10TB at $0.0085/GB for HTTPS in us-east-1). With a 90% cache hit rate, 9 out of 10 requests never touch your origin. For media-heavy applications, this compounds: you also avoid the S3 data transfer charge for serving the same object multiple times.

6. FinOps Culture and Tooling

The technical optimizations above are one-time wins unless they are embedded in an ongoing operational culture. FinOps (Cloud Financial Operations) is the discipline that prevents cost sprawl from reasserting itself. Its core principle is simple: the engineering teams that create cloud costs should own them.

The shift this requires is organizational. In most companies, cloud costs are managed by a centralized platform or finance team. Engineers provision resources without visibility into what they cost. Waste accumulates because nobody has both the technical knowledge and the cost accountability. FinOps moves cost visibility and responsibility to the team level, while the platform team provides tooling, guardrails, and reporting.

Tagging as the foundation. You cannot allocate costs without tags. Define a mandatory tagging schema enforced by Service Control Policies at the AWS Organization level:

Team: payments-platform       # Which team owns this resource
Service: checkout-api         # Which service/application
Environment: production       # production | staging | dev
CostCenter: eng-backend       # For finance allocation
Owner: john.smith@company.com # Who to contact for cleanup

Any resource created without mandatory tags can be rejected at provisioning time via SCPs or AWS Config rules. Cloud Custodian can auto-tag resources created without tags by inferring context from the creator's IAM identity.

Cost allocation reports per team. AWS Cost and Usage Reports (CUR) exported to S3 can be queried with Athena to produce per-team cost breakdowns. Set this up as a scheduled query that posts a weekly Slack message: "This week: payments-platform $4,230 (+12%), auth-service $1,890 (-5%), data-pipeline $7,100 (+3%)." Teams that see their own spend stop treating cloud resources as free.

Budget alerts before the damage is done. Create AWS Budgets for each team with two thresholds: 80% (warning, investigate) and 100% (alert, urgent). Use SNS to notify both the team lead and the platform team. The budget alert at 80% is the signal to investigate before the month closes — not an escalation.

Infracost in CI. Infracost is an open source tool that generates a cost diff on every Terraform or Pulumi pull request. Engineers see: "This PR adds a NAT Gateway — estimated cost: +$32/month" before they merge. The immediate feedback loop changes behavior more effectively than any quarterly review. Add it to your CI pipeline in under an hour:

# .github/workflows/infracost.yml
name: Infracost Cost Diff

on:
  pull_request:
    paths:
      - "infrastructure/**"
      - "terraform/**"

jobs:
  infracost:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - name: Setup Infracost
        uses: infracost/actions/setup@v3
        with:
          api-key: ${{ secrets.INFRACOST_API_KEY }}
      - name: Generate cost estimate
        run: |
          infracost breakdown \
            --path=./infrastructure \
            --format=json \
            --out-file=/tmp/infracost-base.json
      - name: Post PR comment
        uses: infracost/actions/comment@v3
        with:
          path: /tmp/infracost-base.json
          behavior: update

Kubecost for Kubernetes. If you run EKS, Kubecost (open source tier) allocates cluster costs to namespaces, deployments, and teams. The default Kubernetes experience is that cost is invisible: you see a single EKS cluster line item. Kubecost surfaces that "the recommendation-service namespace consumed $1,200 of cluster resources this month, of which 60% was idle due to over-requested CPU limits." It integrates with AWS billing to show node costs alongside pod costs.

Monthly cost review ritual. Block 30 minutes the first Tuesday of every month. Review: total bill vs previous month, top 5 cost drivers, any anomalies or spikes, Savings Plan coverage and utilization (unused commitments are wasted money too), and open action items from last month. The ritual keeps the discipline alive and prevents optimization debt from compounding. Rotate ownership of the meeting across engineering teams.

The distinction between show-back and charge-back matters for adoption. Show-back means teams see their costs but do not pay them from a budget. Charge-back means cloud costs are deducted from team or business unit budgets. Show-back is the right starting point — it builds visibility and ownership before adding financial consequence. Most companies find that show-back alone drives 80% of the behavioral change, without the organizational friction of charge-back.

Conclusion

Cloud cost optimization is not a one-time audit. It is a continuous operational discipline with four distinct layers, each building on the last.

Start with waste elimination. Run the boto3 scanner in this post against every region in your account. Remediate what you find. Set it up to run weekly. Expect to recover 10-15% of your current bill within 30 days.

Move to right-sizing. Pull 14 days of CloudWatch data. Run the right-sizing reporter. Prioritize Graviton3 migrations — they require no code changes and deliver 20-40% compute cost reduction. Work through the list systematically over a quarter.

Buy commitment discounts on what remains. Use Compute Savings Plans for flexibility. Target 70% coverage of your consistent compute baseline. Let Cost Explorer calculate the optimal commitment — do not guess.

Optimize architecture for unit cost. Add VPC Gateway Endpoints for S3 and DynamoDB — this is a free, zero-risk change that can save hundreds of dollars per month for data-heavy workloads. Audit NAT Gateway usage. Apply S3 storage tier policies. Add CloudFront caching.

Finally, build the FinOps culture that keeps these gains from eroding. Enforce tagging at the SCP level. Put Infracost in CI. Run the monthly cost review. The technical work compounds only if the organizational habits are in place to sustain it.

Teams that execute all four phases consistently — not as a one-time initiative but as operational practice — routinely cut cloud bills by 40-60% without sacrificing reliability or performance. The $50,000/month AWS bill becomes $25,000-$30,000, with better observability and more predictable costs than before.

Sources

Flexera — State of the Cloud Report
AWS — Compute Optimizer documentation
AWS — Savings Plans overview
AWS — EC2 Spot Instance Advisor
AWS — Pricing calculator
Cloud Custodian documentation
Infracost documentation
Kubecost documentation
FinOps Foundation — Framework and principles

About the Author

Toc Am

Founder of AmtocSoft. Writing practical deep-dives on AI engineering, cloud architecture, and developer tooling. Previously built backend systems at scale. Reviews every post published under this byline.

LinkedIn X / Twitter

Published: 2026-06-13 · Updated: 2026-04-18 · Written with AI assistance, reviewed by Toc Am.

Get These In Your Inbox

Weekly deep-dives on AI engineering, no fluff. Join the newsletter →

Subscribe (free)

Or grab the book ($39, ~100 pages) · Buy me a coffee

☕ Buy Me a Coffee · 🔔 YouTube · 💼 LinkedIn · 🐦 X/Twitter

AmtocSoft Tech Insights

Friday, April 17, 2026

Cloud Cost Optimization in 2026: Right-Sizing, Spot Instances, FinOps, and the Patterns That Cut AWS Bills in Half

Introduction

1. Waste Elimination: The Low-Hanging Fruit

2. Right-Sizing EC2 and RDS

3. Spot and Preemptible Instances

4. Savings Plans and Reserved Instances

5. Architectural Cost Patterns

6. FinOps Culture and Tooling

Conclusion

Sources

No comments:

Post a Comment

Structured Outputs Beyond JSON: Using Constrained Generation for Reliable Agent Tool Calls

Report Abuse

Labels