429 Too Many Requests — Rate limit reached for requests
appears when:When request volume, token volume, or concurrent calls exceed the current tier's RPM or TPM limit for the selected model
OpenAI API rate limit 429 error
Three separate limits (RPM, TPM, RPD) apply simultaneously. Retries alone cannot solve sustained overage — you need client-side rate limiting, queueing, or a tier bump.
x-ratelimit-remaining response headers to know which. Pair the openai SDK's maxRetries with an Upstash rate limiter keyed on userId, lower max_tokens to realistic values, and queue non-interactive work in a background job. Separate dev and prod API keys so one environment cannot exhaust the other.Quick fix for OpenAI 429 rate limit
01// lib/openai.ts — retry, rate limit, and cost guard02import OpenAI from "openai";03import { Ratelimit } from "@upstash/ratelimit";04import { Redis } from "@upstash/redis";05 06const client = new OpenAI({07 apiKey: process.env.OPENAI_API_KEY!,08 maxRetries: 3, // built-in exponential backoff, respects Retry-After09 timeout: 30_000,10});11 12const limiter = new Ratelimit({13 redis: Redis.fromEnv(),14 // 20 requests per user per minute — stays well under Tier 1 limits15 limiter: Ratelimit.slidingWindow(20, "60 s"),16});17 18export async function chatForUser(userId: string, messages: OpenAI.ChatCompletionMessageParam[]) {19 const { success, reset } = await limiter.limit(userId);20 if (!success) {21 const waitMs = reset - Date.now();22 throw new Error(`Rate limited. Retry in ${Math.ceil(waitMs / 1000)}s`);23 }24 25 return client.chat.completions.create({26 model: "gpt-4o-mini",27 messages,28 max_tokens: 500, // realistic cap — do not reserve 4000 if you return 50029 });30}Deeper fixes when the quick fix fails
01 · Queue with Inngest for burst traffic
01// inngest/functions.ts02import { Inngest } from "inngest";03import OpenAI from "openai";04 05const inngest = new Inngest({ id: "my-app" });06const client = new OpenAI();07 08export const summarize = inngest.createFunction(09 {10 id: "summarize-document",11 concurrency: { limit: 10 }, // max 10 concurrent calls to OpenAI12 throttle: { limit: 100, period: "1m" }, // cap at 100 req/min13 retries: 3,14 },15 { event: "document/summarize" },16 async ({ event }) => {17 const res = await client.chat.completions.create({18 model: "gpt-4o-mini",19 messages: [{ role: "user", content: `Summarize: ${event.data.content}` }],20 max_tokens: 300,21 });22 return { summary: res.choices[0].message.content };23 }24);02 · Cost cap per user per day
01// lib/cost-cap.ts — block users who exceed daily cost budget02import { Redis } from "@upstash/redis";03 04const redis = Redis.fromEnv();05const DAILY_COST_CENTS = 50; // $0.50/user/day for MVP free tier06 07export async function assertWithinBudget(userId: string, estimatedCents: number) {08 const key = `cost:${userId}:${new Date().toISOString().slice(0, 10)}`;09 const spent = (await redis.get<number>(key)) ?? 0;10 11 if (spent + estimatedCents > DAILY_COST_CENTS) {12 throw new Error("Daily AI usage limit reached. Upgrade to continue.");13 }14 15 await redis.incrby(key, estimatedCents);16 await redis.expire(key, 60 * 60 * 24 * 2);17}03 · Load test that validates rate limits
01// tests/openai-rate-limit.test.ts02import { describe, it, expect } from "vitest";03 04describe("openai rate limiter", () => {05 it("rejects the 21st request within 60 seconds", async () => {06 const responses = await Promise.all(07 Array.from({ length: 25 }, () =>08 fetch("/api/ai/chat", {09 method: "POST",10 headers: {11 "content-type": "application/json",12 cookie: `session=${TEST_SESSION}`,13 },14 body: JSON.stringify({ message: "hi" }),15 }).then((r) => r.status)16 )17 );18 19 const ok = responses.filter((s) => s === 200).length;20 const rateLimited = responses.filter((s) => s === 429).length;21 22 expect(ok).toBeLessThanOrEqual(20);23 expect(rateLimited).toBeGreaterThanOrEqual(5);24 });25});Why AI-built apps hit OpenAI 429 rate limit
OpenAI applies three rate limits simultaneously to every API key: requests per minute (RPM), tokens per minute (TPM), and requests per day (RPD). The TPM limit counts both prompt tokens and the max_tokens reservation for the response — even if the model returns 100 tokens, a max_tokens: 4000 request costs 4000 TPM at send time. You hit whichever limit crosses first, and the API returns 429 with a Retry-After header suggesting how long to wait. The response headers x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens show how close you are to each.
AI-generated apps hit 429s fast because the default scaffolds do three things wrong. First, they share one API key across dev, staging, and production — so a runaway dev script eats the production quota. Second, they set max_tokens to the model maximum (8192 or 16384) out of caution, which reserves massive TPM even for short responses. Third, they make concurrent requests from a page that renders multiple AI-powered widgets, each firing a request on mount. A single user visiting the page can send 5-10 simultaneous requests and blow the 500 RPM tier 1 budget for every other user.
Retries are the first reflex and the weakest solution. The openai Node SDK retries up to 2 times by default with exponential backoff, respecting the Retry-After header. That handles transient spikes gracefully. But if your average request rate exceeds the tier limit, retries only delay the problem — every request starts with a retry, and your latency grows without bound. The real fix is client-side rate limiting: cap requests before they hit OpenAI, enqueue burst traffic, and only dispatch when the window has capacity.
Tier upgrades are the last resort but often correct. Tier 1 on gpt-4o (500 RPM, 30k TPM) is fine for an MVP but breaks the moment you have 100 concurrent users. Tier 2 doubles everything and unlocks after $50 in payments plus 7 days. Most production AI apps belong at tier 3+ ($100 lifetime spend). Check your current tier at platform.openai.com/settings/organization/limits. If you are doing legitimate volume and still 429ing, fill out the usage-increase form — OpenAI often grants custom limits within 2-3 business days.
OpenAI 429 rate limit by AI builder
How often each AI builder ships this error and the pattern that produces it.
| Builder | Frequency | Pattern |
|---|---|---|
| Lovable | Every AI feature scaffold | One OPENAI_API_KEY for dev+prod; no rate limit |
| Bolt.new | Common | max_tokens: 4096 hardcoded even for single-sentence responses |
| Cursor | Common | Fires OpenAI call on every keystroke without debounce |
| Base44 | Sometimes | Parallel.map over user requests — bursts above tier limit |
| Replit Agent | Rare | Polls OpenAI status endpoint every second, wastes RPM |
Related errors we fix
Stop OpenAI 429 rate limit recurring in AI-built apps
- →Use separate OpenAI API keys for dev, staging, and production — never share the quota.
- →Set max_tokens to the realistic response length, never the model maximum.
- →Add an Upstash rate limiter keyed on userId before every OpenAI call — reject at the edge.
- →Queue non-interactive AI work in Inngest or BullMQ with concurrency and throttle limits.
- →Set a per-user daily cost cap in Redis — prevents buggy clients from running up your bill.
Still stuck with OpenAI 429 rate limit?
OpenAI 429 rate limit questions
What are OpenAI rate limits actually measured in?+
How does the OpenAI tier system work?+
Does exponential backoff actually work for OpenAI 429s?+
Why do I hit 429 when the dashboard says I am under the limit?+
How much does an Afterbuild Labs OpenAI integration audit cost?+
Ship the fix. Keep the fix.
Emergency Triage restores service in 48 hours. Break the Fix Loop rebuilds CI so this error cannot ship again.
Hyder Shah leads Afterbuild Labs, shipping production rescues for apps built in Lovable, Bolt.new, Cursor, Replit, v0, and Base44. our rescue methodology.