Ayoub Zakaria | DevOps · Cloud

The Wake-Up Call

Our AWS bill was growing 20% month-over-month. Something had to change. Here's exactly what we did to cut costs by 40% while maintaining (and improving) performance.

Strategy 1: Right-Sizing Instances

Most instances are over-provisioned. We used CloudWatch metrics to identify:

bash

# Check average CPU utilization
aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time 2024-01-01T00:00:00Z \
  --end-time 2024-01-31T23:59:59Z \
  --period 3600 \
  --statistics Average

Finding: 60% of our instances averaged < 20% CPU utilization.

Action: Downsized from m5.xlarge → m5.large where appropriate.

Savings: ~$15,000/year

Strategy 2: Reserved Instances for Baseline

For workloads that run 24/7, Reserved Instances are a no-brainer:

Tip: Start with Convertible RIs for flexibility.

Strategy 3: Spot Instances for Batch Jobs

For non-critical workloads (CI/CD runners, batch processing):

yaml

# Example: GitLab Runner on Spot
Resources:
  SpotFleet:
    Type: AWS::EC2::SpotFleet
    Properties:
      SpotFleetRequestConfigData:
        IamFleetRole: !GetAtt SpotFleetRole.Arn
        TargetCapacity: 5
        AllocationStrategy: lowestPrice
        LaunchSpecifications:
          - InstanceType: m5.large
            SpotPrice: "0.04"  # vs $0.096 on-demand

Savings: 60-70% compared to on-demand.

Strategy 4: S3 Lifecycle Policies

Data accumulates. Set up automatic tiering:

json

{
  "Rules": [
    {
      "ID": "MoveToIA",
      "Status": "Enabled",
      "Transitions": [
        {
          "Days": 30,
          "StorageClass": "STANDARD_IA"
        },
        {
          "Days": 90,
          "StorageClass": "GLACIER"
        }
      ],
      "Expiration": {
        "Days": 365
      }
    }
  ]
}

Savings: 70% on storage costs for old data.

Strategy 5: NAT Gateway Optimization

NAT Gateways are expensive ($0.045/hour + $0.045/GB).

Solution: Use VPC endpoints for AWS services:

bash

# S3 Gateway Endpoint (free!)
aws ec2 create-vpc-endpoint \
  --vpc-id vpc-1234567890abcdef0 \
  --service-name com.amazonaws.us-east-1.s3 \
  --route-table-ids rtb-1234567890abcdef0

The Results

Key Takeaways

Measure first - You can't optimize what you don't measure
Right-size everything - Most resources are over-provisioned
Use RIs for baseline - Predictable workloads = predictable savings
Spot for burst - Accept interruption for massive savings
Automate cleanup - Old data and unused resources add up

AWS Cost Optimization: How We Cut Our Bill by 40%