The starting point
Our setup was typical of an early-stage startup: everything was over-provisioned "just in case," instances were running 24/7 regardless of load, and there was no cost visibility. The monthly bill was growing linearly with user count — clearly unsustainable.
1. Right-sizing EC2 instances
The first and most impactful change. I pulled CloudWatch metrics for CPU, memory, and network across all instances for 30 days. The findings were stark:
- Our API server was running on a
t3.xlargewith average CPU utilisation of 12%. - The database server had 16GB RAM allocated but never exceeded 4GB in use.
- A staging environment was running the same instance types as production.
I downsized the API server to t3.medium, moved the staging env to t3.micro, and
right-sized the database instance. This alone cut ~30% of the bill.
2. Deployment architecture cleanup
We had orphaned EBS volumes from old deployments, unused Elastic IPs, and an over-provisioned NAT Gateway routing traffic that could use VPC endpoints instead. Cleaning this up was tedious but saved another ~15%.
3. Application-level optimisations
Some costs were being driven by inefficient application code rather than infrastructure:
- Reduced S3 API calls by implementing proper caching headers and a local cache layer.
- Consolidated multiple small Lambda functions into fewer, more efficient ones.
- Optimised database queries that were causing unnecessary read replicas to spin up.
4. Reserved instances and Savings Plans
Once I had a stable, right-sized infrastructure, I committed to 1-year reserved instances for the production workload. This locked in the remaining ~15% savings.
The result
Total reduction: approximately 60%. The platform now serves 20,000+ users and processes 45,000+ meals per month at a fraction of the original infrastructure cost. More importantly, performance actually improved because the optimisation process forced us to address inefficiencies in the application layer too.
Key takeaways
- Measure before you optimise. CloudWatch metrics told the whole story.
- Right-sizing is the highest-leverage move for early-stage startups.
- Don't ignore infrastructure debt — it compounds just like code debt.
- Application-level changes often matter more than infrastructure changes.