Documentation

Fix OpenAI 429 Errors

GPT, embeddings, and image generation APIs

Why OpenAI returns 429

OpenAI enforces strict upstream quotas. When your integration exceeds permitted throughput, the API responds with HTTP 429 and rate-limit headers that signal how long to wait before retrying.

  • Burst traffic from batch inference jobs exhausts token limits mid-request.
  • Parallel agent loops hammer chat/completions without backoff.
  • Missing retry-after handling causes cascading 429 storms.

OpenAI rate-limit headers

Monitor these response headers to detect quota exhaustion before users see failures. Official reference: OpenAI rate limit documentation.

HeaderMeaning
x-ratelimit-limit-requestsMaximum number of requests permitted in the current rate-limit window.
x-ratelimit-remaining-requestsNumber of requests remaining before the limit is enforced.
x-ratelimit-reset-requestsTime until the request quota resets (typically expressed as a duration string).
x-ratelimit-limit-tokensMaximum tokens allowed in the current window for token-based models.
x-ratelimit-remaining-tokensRemaining token budget before throttling occurs.
retry-afterSeconds to wait before retrying after a 429 response.

Default OpenAI limits

Requests per minute
500
Requests per day
10,000
Throttle status code
429

Drop-in fix with ThrottleProxy

Route OpenAI traffic through proxy.throttleproxy.com to enforce workspace-level RPM ceilings, queue bursty workloads, and absorb retries before they hit your upstream quota.

curl "https://proxy.throttleproxy.com/proxy?target=https://api.example.com/v1/resource" \
  -H "Authorization: Bearer YOUR_API_KEY"

ThrottleProxy queues eligible bursts, enforces workspace RPM ceilings, and surfaces privacy-safe request activity so teams can understand pressure before increasing traffic to OpenAI.

Start Setup →

More rate-limit guides