How I Cut a Cloud Bill by 90% Without Touching Latency
Cost is a feature. Here is the playbook I used to cut a workload’s cloud spend by ~90% while improving p99 — and why most teams leave this money on the table.
Most engineers treat the cloud bill as someone else's problem. It isn't. A bill is a design document — it tells you, in dollars, exactly where your architecture disagrees with reality.
Early in my career I inherited a workload that was quietly burning money. It was provisioned for peak traffic that arrived for a few hours a day, and sat mostly-idle the rest of the time. The instinct in the room was to negotiate a discount. The right move was to change the shape of the system.
Start with the truth, not the bill
Before changing anything, I instrumented real usage: requests over time, the actual distribution of load, where time was spent, and what each request truly cost. You cannot optimise what you cannot see — and almost every "expensive system" is really an observability problem wearing a cost problem's clothes.
The data told a clear story: we were paying for a fleet sized to peak, running 24/7, to serve a load that was bursty and largely asynchronous.
Re-shape the workload
The fix was architectural, not financial:
- Move batch work off the hot path. Anything that didn't need to happen in the request became event-driven and asynchronous.
- Right-size and autoscale. Compute scaled with demand instead of being pinned to peak.
- Adopt serverless primitives where cold-start cost was acceptable, so we paid for work done, not capacity reserved.
The counter-intuitive result
Spend dropped by roughly 90% for that workload — and p99 latency got better, not worse. Smaller, demand-shaped components under less contention behave more predictably than one over-provisioned monolith.
The lesson
Cost is a feature. Engineers who ignore the bill ship liabilities.
This single project reframed how I build. Years later, when I needed to scale a fleet of real browsers for Snappit, the same discipline — measure, shape demand, pay for work not capacity — was the difference between a viable product and an expensive science project.
If your bill is growing faster than your usage, that's not a finance problem. That's an architecture review waiting to happen.
Related reading
What an Accounting Degree Taught Me About Building Software Companies
I trained as an accountant and became an engineer and founder. The detour wasn’t wasted — it’s my edge. On outcomes, distribution, and finding what’s true.
In Automation, Reliability Is the Product
Browser automation looks like a demo problem and is actually a reliability problem. What building Snappit taught me about why self-healing beats clever.