How I Cut AWS Costs by 60% Without Sacrificing Performance

When I joined MealPe as the sole backend engineer, our AWS bill was ballooning. The platform was growing fast but the infrastructure hadn't been optimised since day one. Here's the exact playbook I used to cut costs by 60% — without downtime, without degrading user experience, and without adding complexity.

The starting point

Our setup was typical of an early-stage startup: everything was over-provisioned "just in case," instances were running 24/7 regardless of load, and there was no cost visibility. The monthly bill was growing linearly with user count — clearly unsustainable.

1. Right-sizing EC2 instances

The first and most impactful change. I pulled CloudWatch metrics for CPU, memory, and network across all instances for 30 days. The findings were stark:

Our API server was running on a t3.xlarge with average CPU utilisation of 12%.
The database server had 16GB RAM allocated but never exceeded 4GB in use.
A staging environment was running the same instance types as production.

I downsized the API server to t3.medium, moved the staging env to t3.micro, and right-sized the database instance. This alone cut ~30% of the bill.

2. Deployment architecture cleanup

We had orphaned EBS volumes from old deployments, unused Elastic IPs, and an over-provisioned NAT Gateway routing traffic that could use VPC endpoints instead. Cleaning this up was tedious but saved another ~15%.

3. Application-level optimisations

Some costs were being driven by inefficient application code rather than infrastructure:

Reduced S3 API calls by implementing proper caching headers and a local cache layer.
Consolidated multiple small Lambda functions into fewer, more efficient ones.
Optimised database queries that were causing unnecessary read replicas to spin up.

4. Reserved instances and Savings Plans

Once I had a stable, right-sized infrastructure, I committed to 1-year reserved instances for the production workload. This locked in the remaining ~15% savings.

The result

Total reduction: approximately 60%. The platform now serves 20,000+ users and processes 45,000+ meals per month at a fraction of the original infrastructure cost. More importantly, performance actually improved because the optimisation process forced us to address inefficiencies in the application layer too.

Key takeaways

Measure before you optimise. CloudWatch metrics told the whole story.
Right-sizing is the highest-leverage move for early-stage startups.
Don't ignore infrastructure debt — it compounds just like code debt.
Application-level changes often matter more than infrastructure changes.