The starting point

Our setup was typical of an early-stage startup: everything was over-provisioned "just in case," instances were running 24/7 regardless of load, and there was no cost visibility. The monthly bill was growing linearly with user count — clearly unsustainable.

1. Right-sizing EC2 instances

The first and most impactful change. I pulled CloudWatch metrics for CPU, memory, and network across all instances for 30 days. The findings were stark:

  • Our API server was running on a t3.xlarge with average CPU utilisation of 12%.
  • The database server had 16GB RAM allocated but never exceeded 4GB in use.
  • A staging environment was running the same instance types as production.

I downsized the API server to t3.medium, moved the staging env to t3.micro, and right-sized the database instance. This alone cut ~30% of the bill.

2. Deployment architecture cleanup

We had orphaned EBS volumes from old deployments, unused Elastic IPs, and an over-provisioned NAT Gateway routing traffic that could use VPC endpoints instead. Cleaning this up was tedious but saved another ~15%.

3. Application-level optimisations

Some costs were being driven by inefficient application code rather than infrastructure:

  • Reduced S3 API calls by implementing proper caching headers and a local cache layer.
  • Consolidated multiple small Lambda functions into fewer, more efficient ones.
  • Optimised database queries that were causing unnecessary read replicas to spin up.

4. Reserved instances and Savings Plans

Once I had a stable, right-sized infrastructure, I committed to 1-year reserved instances for the production workload. This locked in the remaining ~15% savings.

The result

Total reduction: approximately 60%. The platform now serves 20,000+ users and processes 45,000+ meals per month at a fraction of the original infrastructure cost. More importantly, performance actually improved because the optimisation process forced us to address inefficiencies in the application layer too.

Key takeaways

  • Measure before you optimise. CloudWatch metrics told the whole story.
  • Right-sizing is the highest-leverage move for early-stage startups.
  • Don't ignore infrastructure debt — it compounds just like code debt.
  • Application-level changes often matter more than infrastructure changes.