Skip to content
All writing
Engineering

How I Cut a Cloud Bill by 90% Without Touching Latency

Cost is a feature. Here is the playbook I used to cut a workload’s cloud spend by ~90% while improving p99 — and why most teams leave this money on the table.

2 min read

Most engineers treat the cloud bill as someone else's problem. It isn't. A bill is a design document — it tells you, in dollars, exactly where your architecture disagrees with reality.

Early in my career I inherited a workload that was quietly burning money. It was provisioned for peak traffic that arrived for a few hours a day, and sat mostly-idle the rest of the time. The instinct in the room was to negotiate a discount. The right move was to change the shape of the system.

Start with the truth, not the bill

Before changing anything, I instrumented real usage: requests over time, the actual distribution of load, where time was spent, and what each request truly cost. You cannot optimise what you cannot see — and almost every "expensive system" is really an observability problem wearing a cost problem's clothes.

The data told a clear story: we were paying for a fleet sized to peak, running 24/7, to serve a load that was bursty and largely asynchronous.

Re-shape the workload

The fix was architectural, not financial:

  • Move batch work off the hot path. Anything that didn't need to happen in the request became event-driven and asynchronous.
  • Right-size and autoscale. Compute scaled with demand instead of being pinned to peak.
  • Adopt serverless primitives where cold-start cost was acceptable, so we paid for work done, not capacity reserved.

The counter-intuitive result

Spend dropped by roughly 90% for that workload — and p99 latency got better, not worse. Smaller, demand-shaped components under less contention behave more predictably than one over-provisioned monolith.

The lesson

Cost is a feature. Engineers who ignore the bill ship liabilities.

This single project reframed how I build. Years later, when I needed to scale a fleet of real browsers for Snappit, the same discipline — measure, shape demand, pay for work not capacity — was the difference between a viable product and an expensive science project.

If your bill is growing faster than your usage, that's not a finance problem. That's an architecture review waiting to happen.

#Cloud#Cost#Architecture#Google Cloud#Performance