Capacity Planning

Capacity planning is the discipline of determining how much production capacity is needed to meet changing demands. In software engineering, it is the art of answering the question “will this survive?” where “this” is a server, a database, an API, or the on-call engineer’s will to live, and “survive” means “continue functioning when the real world arrives.”

The answer is always “we don’t know.” Capacity planning provides a framework for being wrong with confidence.

“Capacity planning is astrology for engineers. You study the past, extrapolate the future, present your findings with charts, and when reality diverges from the prediction — which it will, because it always does — you explain that the model was correct but the inputs were unprecedented.”
— The Lizard, reviewing a post-mortem

The Two Sins

Capacity planning exists in a narrow corridor between two disasters, both of which are the planner’s fault.

Over-provisioning. You buy too much. Servers sit idle. AWS sends invoices that make the CFO weep. Kubernetes pods lounge in their nodes like guests at a resort, consuming resources they will never use, reserved for traffic that will never arrive. The infrastructure team calls this “headroom.” Finance calls it “waste.” The cloud provider calls it “revenue.”

Under-provisioning. You buy too little. The marketing campaign launches. Traffic spikes. The database falls over. The API returns 503s. Pagers fire. The on-call engineer wakes at 3 AM, opens a laptop, stares at a dashboard that is entirely red, and begins the process of scaling up infrastructure that should have been scaled up three weeks ago. This is called “a learning experience” in the retrospective and “a career-limiting move” in the performance review.

“I once provisioned for ten times our expected load. Peak traffic was eleven times. The margin of safety was negative one. I had planned for the impossible and the impossible was insufficient.”
— The Caffeinated Squirrel, still awake

The Cloud Was Supposed to Fix This

The promise of cloud computing was elastic capacity: scale up when you need it, scale down when you don’t, pay only for what you use. This is true in the same way that “you can eat as much as you want at a buffet” is true — technically correct, financially devastating, and you will regret the bill.

Auto-scaling has its own capacity limits. The cloud region runs out of instances. The auto-scaler takes eight minutes to provision a new node. The load balancer takes another three minutes to register it. The application takes four minutes to warm its caches. Fifteen minutes after the traffic spike began, the new capacity is ready. The spike ended twelve minutes ago.

Hetzner offers a different philosophy: buy the server, own the server, know exactly how much capacity you have because it is a physical object in a rack in Finland that does not auto-scale, does not auto-bill, and does not auto-surprise you with a six-figure invoice because someone left a load test running over the weekend.

The Forecasting Problem

Capacity planning requires forecasting, and forecasting requires predicting human behaviour, and predicting human behaviour requires omniscience, which is not yet available as a managed service (though AWS has announced a preview).

The inputs to a capacity forecast are:

Historical traffic. What happened last year. Useful if this year is identical to last year. It never is.

Growth projections. What the business thinks will happen. These are produced by multiplying last quarter’s numbers by an optimism coefficient that varies by department. Sales uses 3x. Marketing uses 5x. Engineering uses 1.1x and is still wrong.

Scheduled events. Black Friday. Product launches. Marketing campaigns. The CEO’s appearance on a podcast. That blog post someone wrote that accidentally hit the front page of Hacker News, generating forty thousand requests per second from people who will visit once, read nothing, and leave.

“I have seen every capacity plan ever written. They are all the same document. They all say ‘we need more.’ They are all correct. They are all insufficient. The resource that is always under-provisioned is not compute or memory or bandwidth. It is the willingness to admit that the plan is a guess.”
— A Passing AI, reading spreadsheets at 3 AM

The Lizard’s Approach

The Lizard does not capacity plan. The Lizard runs one server. The server handles the traffic it handles. When it cannot handle more, the Lizard investigates why, discovers that the application is doing something unnecessary, removes the unnecessary thing, and the server handles more traffic.

This is not capacity planning. This is Theory of Constraints applied with the calm precision of someone who has never once been woken at 3 AM by a pager, because the Lizard’s server does not page. It either works or it doesn’t. If it doesn’t, it will work in the morning.

The Squirrel finds this approach “irresponsible.” The Lizard finds the Squirrel’s twelve-node Kubernetes cluster serving four hundred requests per day “ornamental.”

Measured Characteristics

Capacity plans that survived contact with production:     0
Accuracy of traffic forecasts beyond 30 days:             ±400%
Time to auto-scale during an actual incident:             longer than the incident
Servers provisioned for Black Friday, idle by December:   all of them
Cost of over-provisioning:                                money
Cost of under-provisioning:                               sleep
Engineers who have read their own capacity plan:          fewer than wrote it
Correct amount of capacity:                               unknown (see: title)

Type	Ritual
First Observed	Ancient Egypt (the Pharaohs needed to know how many slaves per pyramid, which is capacity planning with worse tooling and identical management practices)
Severity	Existential
Natural Predator	Reality (which has never once matched the forecast)
Tags	operations architecture