afterbuild/ops
ERR-958/OpenAI · Rate limit
ERR-958
429 Too Many Requests — Rate limit reached for requests

appears when:When request volume, token volume, or concurrent calls exceed the current tier's RPM or TPM limit for the selected model

OpenAI API rate limit 429 error

Three separate limits (RPM, TPM, RPD) apply simultaneously. Retries alone cannot solve sustained overage — you need client-side rate limiting, queueing, or a tier bump.

Last updated 17 April 2026 · 7 min read · By Hyder Shah
Direct answer
429 means you hit a rate limit on RPM, TPM, or RPD — check the x-ratelimit-remaining response headers to know which. Pair the openai SDK's maxRetries with an Upstash rate limiter keyed on userId, lower max_tokens to realistic values, and queue non-interactive work in a background job. Separate dev and prod API keys so one environment cannot exhaust the other.

Quick fix for OpenAI 429 rate limit

lib/openai.ts
typescript
01// lib/openai.ts — retry, rate limit, and cost guard02import OpenAI from "openai";03import { Ratelimit } from "@upstash/ratelimit";04import { Redis } from "@upstash/redis";05 06const client = new OpenAI({07  apiKey: process.env.OPENAI_API_KEY!,08  maxRetries: 3, // built-in exponential backoff, respects Retry-After09  timeout: 30_000,10});11 12const limiter = new Ratelimit({13  redis: Redis.fromEnv(),14  // 20 requests per user per minute — stays well under Tier 1 limits15  limiter: Ratelimit.slidingWindow(20, "60 s"),16});17 18export async function chatForUser(userId: string, messages: OpenAI.ChatCompletionMessageParam[]) {19  const { success, reset } = await limiter.limit(userId);20  if (!success) {21    const waitMs = reset - Date.now();22    throw new Error(`Rate limited. Retry in ${Math.ceil(waitMs / 1000)}s`);23  }24 25  return client.chat.completions.create({26    model: "gpt-4o-mini",27    messages,28    max_tokens: 500, // realistic cap — do not reserve 4000 if you return 50029  });30}
Per-user rate limit + SDK retry + realistic max_tokens — handles 95% of 429 cases without a tier bump

Deeper fixes when the quick fix fails

01 · Queue with Inngest for burst traffic

inngest/functions.ts
typescript
01// inngest/functions.ts02import { Inngest } from "inngest";03import OpenAI from "openai";04 05const inngest = new Inngest({ id: "my-app" });06const client = new OpenAI();07 08export const summarize = inngest.createFunction(09  {10    id: "summarize-document",11    concurrency: { limit: 10 }, // max 10 concurrent calls to OpenAI12    throttle: { limit: 100, period: "1m" }, // cap at 100 req/min13    retries: 3,14  },15  { event: "document/summarize" },16  async ({ event }) => {17    const res = await client.chat.completions.create({18      model: "gpt-4o-mini",19      messages: [{ role: "user", content: `Summarize: ${event.data.content}` }],20      max_tokens: 300,21    });22    return { summary: res.choices[0].message.content };23  }24);
Inngest handles concurrency and throttle natively — never exceeds your OpenAI budget

02 · Cost cap per user per day

lib/cost-cap.ts
typescript
01// lib/cost-cap.ts — block users who exceed daily cost budget02import { Redis } from "@upstash/redis";03 04const redis = Redis.fromEnv();05const DAILY_COST_CENTS = 50; // $0.50/user/day for MVP free tier06 07export async function assertWithinBudget(userId: string, estimatedCents: number) {08  const key = `cost:${userId}:${new Date().toISOString().slice(0, 10)}`;09  const spent = (await redis.get<number>(key)) ?? 0;10 11  if (spent + estimatedCents > DAILY_COST_CENTS) {12    throw new Error("Daily AI usage limit reached. Upgrade to continue.");13  }14 15  await redis.incrby(key, estimatedCents);16  await redis.expire(key, 60 * 60 * 24 * 2);17}
Protects against a malicious or buggy client from running up your OpenAI bill

03 · Load test that validates rate limits

tests/openai-rate-limit.test.ts
typescript
01// tests/openai-rate-limit.test.ts02import { describe, it, expect } from "vitest";03 04describe("openai rate limiter", () => {05  it("rejects the 21st request within 60 seconds", async () => {06    const responses = await Promise.all(07      Array.from({ length: 25 }, () =>08        fetch("/api/ai/chat", {09          method: "POST",10          headers: {11            "content-type": "application/json",12            cookie: `session=${TEST_SESSION}`,13          },14          body: JSON.stringify({ message: "hi" }),15        }).then((r) => r.status)16      )17    );18 19    const ok = responses.filter((s) => s === 200).length;20    const rateLimited = responses.filter((s) => s === 429).length;21 22    expect(ok).toBeLessThanOrEqual(20);23    expect(rateLimited).toBeGreaterThanOrEqual(5);24  });25});
CI test — someone removes the limiter, this test catches it before prod

Why AI-built apps hit OpenAI 429 rate limit

OpenAI applies three rate limits simultaneously to every API key: requests per minute (RPM), tokens per minute (TPM), and requests per day (RPD). The TPM limit counts both prompt tokens and the max_tokens reservation for the response — even if the model returns 100 tokens, a max_tokens: 4000 request costs 4000 TPM at send time. You hit whichever limit crosses first, and the API returns 429 with a Retry-After header suggesting how long to wait. The response headers x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens show how close you are to each.

AI-generated apps hit 429s fast because the default scaffolds do three things wrong. First, they share one API key across dev, staging, and production — so a runaway dev script eats the production quota. Second, they set max_tokens to the model maximum (8192 or 16384) out of caution, which reserves massive TPM even for short responses. Third, they make concurrent requests from a page that renders multiple AI-powered widgets, each firing a request on mount. A single user visiting the page can send 5-10 simultaneous requests and blow the 500 RPM tier 1 budget for every other user.

Retries are the first reflex and the weakest solution. The openai Node SDK retries up to 2 times by default with exponential backoff, respecting the Retry-After header. That handles transient spikes gracefully. But if your average request rate exceeds the tier limit, retries only delay the problem — every request starts with a retry, and your latency grows without bound. The real fix is client-side rate limiting: cap requests before they hit OpenAI, enqueue burst traffic, and only dispatch when the window has capacity.

Tier upgrades are the last resort but often correct. Tier 1 on gpt-4o (500 RPM, 30k TPM) is fine for an MVP but breaks the moment you have 100 concurrent users. Tier 2 doubles everything and unlocks after $50 in payments plus 7 days. Most production AI apps belong at tier 3+ ($100 lifetime spend). Check your current tier at platform.openai.com/settings/organization/limits. If you are doing legitimate volume and still 429ing, fill out the usage-increase form — OpenAI often grants custom limits within 2-3 business days.

OpenAI 429 rate limit by AI builder

How often each AI builder ships this error and the pattern that produces it.

AI builder × OpenAI 429 rate limit
BuilderFrequencyPattern
LovableEvery AI feature scaffoldOne OPENAI_API_KEY for dev+prod; no rate limit
Bolt.newCommonmax_tokens: 4096 hardcoded even for single-sentence responses
CursorCommonFires OpenAI call on every keystroke without debounce
Base44SometimesParallel.map over user requests — bursts above tier limit
Replit AgentRarePolls OpenAI status endpoint every second, wastes RPM

Related errors we fix

Stop OpenAI 429 rate limit recurring in AI-built apps

Still stuck with OpenAI 429 rate limit?

Emergency triage · $299 · 48h turnaround
We restore service and write the root-cause report.
start the triage →

OpenAI 429 rate limit questions

What are OpenAI rate limits actually measured in?+
Three separate limits apply at the same time: requests per minute (RPM), tokens per minute (TPM), and for some models requests per day (RPD). You hit whichever one you cross first. Tier 1 on gpt-4o has 500 RPM and 30,000 TPM — a single 6,000-token request can block you for 12 seconds even though you made only 1 request. Check the x-ratelimit-remaining-tokens and x-ratelimit-remaining-requests headers on every response to see which limit is close.
How does the OpenAI tier system work?+
Tiers are based on total payment history on the account. Tier 1 kicks in after $5 paid. Tier 2 after $50 and 7+ days since first payment. Tier 3 after $100, Tier 4 after $250, Tier 5 after $1000. Each tier roughly doubles RPM and TPM on most models. To move up, pay in advance via the billing portal — charges sit as credits until consumed. New accounts often hit 429s because their tier is too low for even moderate traffic.
Does exponential backoff actually work for OpenAI 429s?+
Yes, but only for transient spikes. The official openai SDK has built-in retry with exponential backoff — respect the Retry-After header when present. For sustained overage (you genuinely send 1000 req/min on a 500 RPM tier), retries just defer the problem. The real fixes are client-side rate limiting (Upstash), queueing (BullMQ, Inngest), and requesting a tier bump. Retries without rate limiting will exhaust your quota on every request.
Why do I hit 429 when the dashboard says I am under the limit?+
Three common reasons. First, the dashboard averages over longer windows; actual limits are rolling 60-second windows. Second, dev and prod often share one API key, so the quota is shared across environments. Third, token estimates on the request (including max_tokens of the response) count against TPM before the request is made — max_tokens: 4000 costs 4000 TPM even if the model returns 100. Split dev/prod keys and set max_tokens realistically.
How much does an Afterbuild Labs OpenAI integration audit cost?+
For a typical AI app (single OpenAI-powered feature, 5-20k requests per day), the audit including rate limiting, queueing, retry logic, token accounting, and a cost cap takes 2-4 hours. Our Integration Fix service is fixed-fee and includes: Upstash rate limiter setup, server-side queue for burst traffic, per-user cost caps, and a regression test suite that fires 1000 requests per minute to validate the limits hold.
Next step

Ship the fix. Keep the fix.

Emergency Triage restores service in 48 hours. Break the Fix Loop rebuilds CI so this error cannot ship again.

About the author

Hyder Shah leads Afterbuild Labs, shipping production rescues for apps built in Lovable, Bolt.new, Cursor, Replit, v0, and Base44. our rescue methodology.

Sources