esc
Anthology / Yagnipedia / 429 Too Many Requests

429 Too Many Requests

You Have Reached the Limit of Our Willingness
Artifact · First observed 2012 (RFC 6585, though the concept of "too many" predates the specification by the entire history of hospitality) · Severity: Variable (ranges from "wait a second" to "wait three hours and reconsider your career")

429 Too Many Requests is an HTTP status code indicating that the user has sent too many requests in a given amount of time. In practice, “too many” means “any,” “a given amount of time” means “ever,” and the response is the API’s way of saying “I could help you, but I choose not to, and the reason is administrative.”

The 429 is unique among HTTP error codes in that it is simultaneously the client’s fault (you sent too many requests) and entirely not the client’s fault (you sent one request, which was too many, because the limit is zero, because you are on the free tier, because you have not spent money, because you cannot spend money until you are on the paid tier, because the paid tier requires spending money).

“I have a credit card. I am trying to give you money.”
— Every developer who has received a 429 from a service they are trying to pay for

The Taxonomy

Not all 429s are created equal. The species can be classified by intent:

The Legitimate 429

The rarest variety. Occurs when a client genuinely sends too many requests — thousands per second, a runaway loop, a misconfigured batch job. The server is protecting itself. The 429 is appropriate. The client should implement backoff. Nobody objects to this 429. This 429 is just and necessary.

This article is not about this 429.

The Commercial 429

The most common variety. Occurs when the client has sent a perfectly reasonable number of requests — often one — but has not paid enough to be permitted to send any. The 429 is not protecting the server. The server is fine. The server has capacity for ten thousand requests per second. The 429 is protecting the pricing tier.

The commercial 429 says: “Your request is valid. Your authentication is valid. Your payload is well-formed. You are not welcome.”

The Circular 429

The Google Cloud Console special. Occurs when the client cannot pay because paying requires API access and API access requires payment. The 429 exists in a closed causal loop where the error and its solution are the same action, performed in an order that is not possible.

See: Google Cloud Console, three hours.

The Punitive 429

Occurs after the client has been rate-limited, waited the prescribed Retry-After interval, retried, and been rate-limited again at a lower quota. The punitive 429 is the API’s way of saying “you should have known not to retry, even though I told you to retry, even though retrying is the documented behavior.”

The Retry-After Header

The 429 response may include a Retry-After header indicating how long the client should wait before retrying. The header is a polite fiction.

In theory, Retry-After: 60 means “wait sixty seconds and try again.” In practice, “try again” leads to another 429, because the rate limit has not reset, because the rate limit is not temporal but commercial. You are not being rate-limited by a clock. You are being rate-limited by a pricing spreadsheet.

The Retry-After value for the Google Cloud Console’s Gemini API, for a developer on the free tier, is not measured in seconds. It is measured in “complete the Tier 1 verification process,” which is measured in hours, which is measured in browser tabs, which is measured in existential doubt.

The Squirrel’s Response

The Caffeinated Squirrel loves the 429 because the 429 is an excuse to build retry infrastructure. Upon receiving a single 429, the Squirrel has proposed:

The Squirrel has not proposed “use a different API provider that does not return 429 for the first request.” This solution is too boring. It does not involve jitter.

The Lizard’s Response

The Lizard’s response to the 429 is not documented, because the Lizard does not receive 429s. The Lizard uses APIs that work on the first request. When an API does not work on the first request, the Lizard uses a different API.

This approach does not scale. It does not need to. The Lizard is generating one image. The image costs four cents. The image is generated in fifteen seconds. The Lizard does not have a retry strategy because the Lizard does not need a retry strategy because the Lizard chose a provider that does not require one.

THE BEST REQUEST
IS THE ONE THAT WORKS
THE BEST RETRY
IS NONE 🦎

The Economics of the 429

The 429 has a measurable economic impact that is precisely the opposite of what the API provider intends:

What the provider thinks What actually happens
“Free tier will convert to paid” Developer uses competitor
“Rate limiting protects our infrastructure” Infrastructure is at 3% utilization
“Tier system encourages commitment” Developer commits to OpenRouter
“429 will prompt upgrade” 429 prompts blog post

The developer who receives a 429 from Google Cloud does not think “I should upgrade my tier.” The developer thinks “I should use OpenRouter.” The 429 is not a conversion funnel. The 429 is an ejection seat.

The For Loop

The ultimate rebuttal to the 429 is the for loop. After escaping the Google Cloud Console and connecting through OpenRouter, the developer generated eleven cover images in sequence:

for title in "npm" "Oracle" "PostgreSQL" "Python"; do
  lg cover "$title" --oneshot
done

Zero 429s. Zero retries. Zero exponential backoff. Zero jitter. Each request returned an image. Each image cost four cents. The total infrastructure required to avoid the 429 was: choosing a provider that does not return one.

The Squirrel’s circuit breaker implementation remains undeployed. The Squirrel is processing this.

See Also