Rightsizing is the practice of matching your cloud resources to your actual workload requirements. It sounds simple, but it's one of the most impactful cost optimization strategies available—and one of the most commonly neglected.
Most organizations over-provision their EC2 instances. It makes sense: when launching a new application, it's safer to start big than to risk performance issues. The problem is that "temporary" oversizing often becomes permanent.
This guide walks you through a systematic approach to EC2 rightsizing, from identifying candidates to safely implementing changes.
Why Rightsizing Matters
Before diving into the how, let's quantify the opportunity:
| Instance Type | On-Demand Monthly Cost | Potential Savings |
|---|---|---|
| m5.4xlarge → m5.2xlarge | $560 → $280 | $280/month |
| r5.2xlarge → r5.xlarge | $456 → $228 | $228/month |
| c5.4xlarge → c5.2xlarge | $544 → $272 | $272/month |
Now multiply by the number of oversized instances in your environment. We typically find 30-50% of instances are candidates for rightsizing.
Important: Rightsizing should happen before purchasing Reserved Instances or Savings Plans. Otherwise, you're locking in waste.
Identifying Rightsizing Candidates
The Metrics That Matter
To rightsize effectively, you need data. The key metrics to analyze:
CPU Utilization
- Peak utilization over 2-4 weeks
- Average utilization
- Utilization patterns (spiky vs. steady)
Memory Utilization
- Not available in default CloudWatch metrics
- Requires CloudWatch Agent installation
- Critical for memory-optimized instances
Network Throughput
- Compare to instance network performance limits
- Important for data-intensive workloads
Disk I/O
- EBS throughput and IOPS
- Compare to instance and volume limits
Setting Up Memory Monitoring
CloudWatch doesn't collect memory metrics by default. Install the CloudWatch Agent to capture this critical data:
# Download and install the agent
sudo yum install -y amazon-cloudwatch-agent
# Configure memory metric collection
cat > /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json << 'EOF'
{
"metrics": {
"metrics_collected": {
"mem": {
"measurement": ["mem_used_percent"],
"metrics_collection_interval": 60
}
}
}
}
EOF
# Start the agent
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
-a fetch-config \
-m ec2 \
-c file:/opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json -s
Analysis Framework
Use this framework to categorize instances:
| Category | CPU Max | Memory Max | Recommendation |
|---|---|---|---|
| Idle | < 5% | < 20% | Terminate or stop |
| Oversized | < 40% | < 50% | Rightsize down |
| Right-sized | 40-70% | 50-70% | Monitor |
| Constrained | > 80% | > 80% | Rightsize up or optimize |
Note: These thresholds are guidelines. Adjust based on your workload characteristics and risk tolerance.
The Rightsizing Process
Step 1: Gather Data
Collect at least 2 weeks of utilization data—ideally 4 weeks to capture any monthly patterns.
Use AWS Cost Explorer Rightsizing Recommendations as a starting point:
aws ce get-rightsizing-recommendation \
--service EC2 \
--configuration '{
"RecommendationTarget": "SAME_INSTANCE_FAMILY",
"BenefitsConsidered": true
}'
Step 2: Categorize Workloads
Not all instances can be treated the same:
Stateless application servers: Easiest to rightsize. Can typically resize during deployment.
Databases: Most sensitive. Require careful analysis of query performance, not just resource utilization.
Batch processing: May have bursty patterns. Look at peak requirements, not averages.
Development/Test: Often the lowest-hanging fruit. Typically oversized and running when not needed.
Step 3: Plan the Change
For each candidate, document:
- Current instance type and cost
- Proposed instance type and cost
- Expected savings
- Risk assessment
- Rollback plan
- Testing requirements
Step 4: Test in Non-Production
Before touching production:
- Rightsize equivalent non-production instances
- Run load tests simulating production traffic
- Monitor for performance degradation
- Validate application behavior
Step 5: Implement with Safety Nets
When rightsizing production instances:
For replaceable instances (in Auto Scaling groups):
- Update launch template with new instance type
- Perform rolling replacement
- Monitor during rollout
- Roll back if issues arise
For standalone instances:
- Create AMI backup
- Stop instance during maintenance window
- Change instance type
- Start and validate
- Keep AMI for quick rollback
Instance Family Considerations
Compute-Optimized (C-family)
Best for CPU-intensive workloads. If memory utilization is high but CPU is low, consider switching to general-purpose (M-family).
Memory-Optimized (R-family)
Best for memory-intensive applications like databases and caching. If you're using R instances but memory utilization is below 50%, M-family may be more cost-effective.
General Purpose (M-family)
Balanced compute, memory, and networking. Good default choice when workloads don't have extreme requirements.
Graviton (ARM-based)
Up to 40% better price/performance for compatible workloads. Consider migrating to Graviton (m6g, c6g, r6g) for additional savings.
Advanced Rightsizing Strategies
Burstable Instances (T-family)
T instances are significantly cheaper but have CPU credit limitations. Good candidates:
- Development environments
- Low-traffic web applications
- CI/CD workers with variable load
Monitor CPU credit balance to ensure you're not being throttled.
Auto Scaling Integration
For Auto Scaling groups, consider:
- Mixed instance policies: Use multiple instance types for flexibility
- Predictive scaling: Let AWS anticipate demand
- Scale-in protection: Prevent termination of instances doing important work
Scheduled Scaling
For predictable patterns (business hours, batch windows), use scheduled scaling instead of larger instances:
aws autoscaling put-scheduled-action \
--auto-scaling-group-name my-asg \
--scheduled-action-name scale-down-night \
--recurrence "0 20 * * *" \
--min-size 1 \
--max-size 2 \
--desired-capacity 1
Common Mistakes to Avoid
1. Rightsizing Without Memory Data
CPU alone doesn't tell the whole story. Install CloudWatch Agent before making decisions.
2. Using Averages Instead of Peaks
A server averaging 20% CPU might spike to 90% during peak hours. Always analyze peak utilization.
3. Ignoring Application Performance
Resource utilization isn't the only metric. Monitor application response times, error rates, and user experience.
4. Rightsizing Before Understanding Workload Patterns
Capture enough data to understand daily, weekly, and monthly patterns before making changes.
5. Making Too Many Changes at Once
Rightsize incrementally. If you resize 50 instances and something breaks, troubleshooting is difficult.
Measuring Success
Track these metrics to validate your rightsizing efforts:
Cost metrics:
- EC2 spend before and after
- Cost per transaction/request
- Reserved Instance/Savings Plan coverage ratio
Performance metrics:
- Application response time
- Error rates
- User-reported issues
Operational metrics:
- Number of instances resized
- Savings achieved vs. predicted
- Rollbacks required
Key Takeaways
- Rightsize before committing to Reserved Instances or Savings Plans
- Install CloudWatch Agent to capture memory metrics
- Use 2-4 weeks of data to understand workload patterns
- Test in non-production before touching production
- Implement incrementally with clear rollback plans
- Consider Graviton for additional 40% savings on compatible workloads
Automation Options
Manual rightsizing is tedious and doesn't scale. Consider:
- AWS Compute Optimizer: Free recommendations based on utilization data
- Third-party FinOps tools: More advanced analysis and automation
- Custom Lambda functions: Automated rightsizing based on your rules
Identifying oversized instances is step one. Our scanner automatically analyzes your EC2 fleet against 34 detection policies, including rightsizing opportunities. Start your free scan to see your personalized recommendations.



