afterbuild/ops
ERR-CTX/AI workflow · context
ERR-CTX
AI loses context in large codebase

appears when:Once the project passes a few hundred lines, the AI starts forgetting earlier decisions, re-hallucinating types, and introducing duplicate utilities

AI loses context — fix large-codebase memory loss in Cursor, Claude Code, and v0

Context loss is not a bug in Cursor, Claude Code, or v0. It is the inherent limit of token-window LLMs meeting a growing codebase. The fix is not a better prompt — it is a workflow built on external memory files, tight per-prompt scope, and session checkpoints.

Last updated 18 April 2026 · 11 min read · By Hyder Shah
Direct answer

Your AI loses context because the model’s attention window cannot hold your whole codebase. Three levers solve it: (a) external memory files (CLAUDE.md, .cursorrules, AGENTS.md) that the tool re-reads every session; (b) tight file-level scope on every prompt — never @codebase for large repos; (c) summarization checkpoints between sessions. If you are mid-session and losing context, stop; write a session summary; start a fresh chat with scoped files.

Quick fix for AI loses context

CLAUDE.md
markdown
01# CLAUDE.md / AGENTS.md / .cursorrules — project memory02 03## Architecture04- Next.js 16 App Router (breaking changes from 14 — read node_modules/next/dist/docs)05- Supabase for auth + database (RLS enabled on every table)06- Stripe for payments (webhooks verify raw body; never .json() before verify)07- Vercel deploy (pooled DATABASE_URL on port 6543)08 09## Key types10- User lives in src/types/user.ts — do not re-declare11- ApiResponse<T> is the wrapper for every route handler12- All dates are ISO strings in the API, Date objects in the client13 14## Naming conventions15- Files: kebab-case.tsx16- React components: PascalCase17- Hooks: useXxx, prefixed, in src/hooks/18- Server actions: verbNoun, async, in src/actions/19 20## Do not modify21- src/lib/schema.ts — JSON-LD contract for SEO22- src/components/ProblemPage.tsx — shared problem-page shell23- prisma/migrations/* — migrations are immutable once applied24 25## Current focus26- Building /fix/* pages — each follows the ProblemPage pattern27- See src/app/fix/stripe-webhook-not-firing/page.tsx as the reference
Drop this at repo root. Claude Code reads it automatically; rename to .cursorrules or AGENTS.md for Cursor or OpenAI Codex respectively.

Deeper fixes when the quick fix fails

01 · Per-tool solutions: Cursor

Cursor’s attention is precise when you scope it and smeared when you do not. Never use @codebase for a repository larger than a few thousand lines — it pulls too many files, the working memory overflows, and the answer drifts. Instead use @-references to name the exact files: @src/lib/auth.ts @src/hooks/use-user.ts. Five files is the practical ceiling for a single task.

Split the work correctly between Cursor’s two modes. Composer is for implementation — it edits files directly and keeps attention on the diff. Ask is for design — it reasons without writing, which is what you want when the task is still ambiguous. Mixing them (“design and implement this in Composer”) blows context on the design phase and leaves nothing for implementation.

Maintain a .cursorrules file at repo root. Architecture, key types, naming conventions, files that are load-bearing. Cursor injects it into every prompt; your chat transcript no longer has to carry the same context by hand.

02 · Per-tool solutions: Claude Code

Claude Code reads CLAUDE.md at repo root automatically, plus any CLAUDE.md inside submodules along the current path. That cascading pattern is the highest-leverage feature in the tool — use it. Feature-level memory files are cheaper than bloating the root file, and they scope architectural notes to the part of the tree that needs them.

Use /compact at natural breakpoints. After finishing a feature, after a long debug session, after a cross-file refactor. /compact summarizes the transcript into a short synopsis and discards the rest, resetting effective working memory without losing the decisions. Without compacting, a long Claude Code session ends up in the attention-decay zone quickly.

Prefer small focused PRs over long sessions. The model that handles one feature cleanly is the same model that degrades on the tenth. If the work truly is large, scope it across sessions, not within one. Use --add-dir to let Claude Code reach files outside the current working directory without forcing the whole monorepo into context.

03 · Per-tool solutions: v0

v0 is a front-end-only tool. Context loss in v0 almost always means you have hit its project-size ceiling — it was built to generate single components and short flows, not to carry a multi-page app in working memory. When the chat starts contradicting earlier components, rewriting a design system per turn, or losing track of shared types, the signal is: time to eject.

The migration path is v0 → Next.js. Export the generated code, move it into a real repository, and handle the app-level concerns (routing, auth, database) in your IDE with Cursor or Claude Code. The v0 migrate-out guide walks through the full handoff — folder structure, component boundaries, and the Next.js App Router idioms v0 does not generate by default.

04 · Per-tool solutions: Lovable

Lovable retains chat context per project, which helps until it doesn’t. The failure mode is global prompts that touch the whole app (“refactor our auth”) — Lovable tries to hold the entire project in working memory and drifts. Scope every prompt to a single component with “Edit this component.” The model attends to the current file and ships smaller, safer edits.

Lovable also lets you edit the project description, which functions as a memory file. Make it rich. Stack, architecture, load-bearing components, naming conventions. Lovable re-reads the description when you start a new chat, which is your cheapest insurance against a fresh session losing track of design decisions.

05 · The memory-file pattern (universal)

The pattern is the same across every AI coding tool: maintain a plain-text file the AI re-reads every session. Different tools pick different filenames — CLAUDE.md, .cursorrules, AGENTS.md, README.md for tools that respect it — but the content is identical. Architecture decisions, key types, files that are load-bearing, naming conventions, explicit do-not- modify lists.

Keep the file short. Under 200 lines is the right order of magnitude. If it grows past that, split into feature-level memory files (src/auth/CLAUDE.md, src/payments/CLAUDE.md) so the AI loads only the section it needs. A 2,000-line memory file bloats every prompt and defeats the point. A 150-line memory file is the floor the AI reasons from on every turn.

Treat the memory file like a test: it earns its place by preventing a specific class of regression. Every time you catch the AI re-hallucinating, update the file. Every time the AI proposes a pattern you do not want, add a do-not rule. The file evolves into the shortest possible briefing that keeps the AI on the path.

06 · Summarization checkpoints

When a session runs long, working memory degrades. The fix is a checkpoint: write a 200-word summary of decisions plus open issues, paste it into a fresh chat, and continue from there. Treat chats as disposable; state lives in files, not in transcripts.

A good checkpoint has three sections: (1) decisions — what you chose and why, in bullets; (2) in-progress — the file you were editing, the function, the failing test; (3) next step — one sentence describing the immediate next action. Paste this into a new chat along with the 3-5 files the task touches, and working memory resets with full attention on the current problem.

Claude Code’s /compact automates most of this inside one session. Cursor and Lovable do not have an equivalent yet; you write the checkpoint by hand. Either way, the discipline is the same: short chats, long files.

07 · When memory loss means you need human help

  • The AI rewrites the same function every session, even though it is already well-implemented in the codebase
  • Team members cannot onboard to the project because the AI contradicts the docs and nobody knows which is current
  • New features take longer to ship than old features did — the AI velocity curve is going the wrong way
  • Your CLAUDE.md is longer than your README.md — a signal the memory file is doing work that belongs in real documentation or real abstractions
  • Every deployment surfaces a new regression in a file the AI “helpfully” cleaned up during an unrelated task

Why AI-built apps hit AI loses context

How LLM context windows actually work

A context window is the number of tokens the model accepts on a single forward pass — currently 200K tokens for GPT-4.1, 1M for Claude 4 Opus and Gemini 2.5 Pro, and similar ranges across the frontier. Those numbers are the nominal ceiling. The effective working window — the portion the model reliably attends to — is smaller, usually 30K-60K tokens depending on the task.

The gap between nominal and effective comes from how transformer attention scales. Every token attends to every other token through learned weights; at long distances those weights shrink, and competing signals near the current turn drown out older context. Benchmarks like “needle in a haystack” show frontier models can retrieve a literal string from anywhere in a 1M window, but retrieval is not reasoning. Ask the model to integrate twelve facts scattered across the window and performance collapses well before you hit the token limit. That is why the model “forgets ” code it was shown 50 messages ago even though the message is technically still in the prompt.

Symptoms you are hitting context limits

  1. The model re-asks questions you already answered (“what database are you using?”)
  2. It hallucinates type fields that were defined earlier in the session
  3. It forgets the file structure and proposes new folders that duplicate existing ones
  4. It introduces duplicate utilities — a second formatDate that lives next to your existing one
  5. It re-imports already-imported modules inside the same file
  6. It mixes patterns from unrelated files — wraps a server component in "use client" then adds a database call to it
  7. It contradicts decisions it made two turns ago without acknowledging the reversal

Any one of these in isolation is noise. Three in the same session means the session is cooked. Stop, summarize, restart.

Chats are disposable. State lives in files. If your memory plan depends on scrolling up, you do not have a memory plan.
Hyder Shah, Afterbuild Labs

AI loses context by AI builder

How often each AI builder ships this error and the pattern that produces it.

AI builder × AI loses context
BuilderFrequencyPattern
CursorEvery mid-sized project@codebase pulls too much; no .cursorrules file; Composer and Ask used interchangeably
Claude CodeLong sessionsNo CLAUDE.md at repo root; /compact never used; single session covers multiple features
v0Past the first flowTool is front-end-only — context loss signals the project has outgrown v0 and needs to migrate to Next.js
LovableGlobal promptsThin project description + whole-app refactors; no per-component scope
Bolt.newMediumStackBlitz preview hides architecture drift; no persistent memory file by default
Replit AgentMediumSession-scoped memory; cross-session decisions are not retained without manual notes

Related errors we fix

Stop AI loses context recurring in AI-built apps

Still stuck with AI loses context?

Emergency triage · $299 · 48h turnaround
We restore service and write the root-cause report.
start the triage →

AI loses context questions

Why does Cursor forget code I showed it yesterday?+
Cursor's chat history is not the model's memory — it is a transient prompt that gets truncated, summarized, or discarded across sessions. Once the transcript exceeds the context window, earlier messages are evicted silently. Yesterday's design decisions, type definitions, and architectural notes do not persist into a fresh chat unless you write them into a file the model re-reads. That is what .cursorrules, CLAUDE.md, and AGENTS.md exist for. Treat chat as disposable scratch space; treat files as the actual memory.
What's the difference between context window and working memory?+
Context window is the nominal token budget the model technically accepts — currently 200K to 1M for frontier models. Working memory is what the model actually attends to reliably. Attention weights decay at long distances, so a type definition buried 180K tokens ago is technically inside the window but functionally forgotten. Effective working memory is closer to 30K-60K tokens for most coding tasks. When you hear 'the model lost context,' the window was probably still open; attention had moved on.
Does a CLAUDE.md really help? It feels like ceremony.+
A CLAUDE.md at repo root is read automatically by Claude Code on every session start. That means architecture decisions, naming conventions, and 'do not modify' warnings ride free into every prompt without you re-pasting them. Teams that maintain a CLAUDE.md ship measurably fewer architectural regressions because the AI stops re-hallucinating types that were defined eight conversations ago. It is not ceremony — it is the cheapest possible form of persistent memory for a stateless LLM.
Should I start a new chat for every task?+
Yes, with a caveat. New chats start the model fresh with full attention on the current task — no drift from earlier decisions, no attention dilution. But a cold chat with no priming is worse than a stale one. The pattern that works: write a 200-word task summary, paste it into a new chat along with the 3-5 files the task touches, then work. Short focused chats outperform long meandering ones on every metric — accuracy, speed, and cost.
How big can my project get before Cursor starts losing context?+
Context loss does not correlate with lines of code — it correlates with how many files the AI has to reason about simultaneously. Cursor handles a 50,000-line codebase fine if each task touches fewer than five files. It falls over on a 5,000-line codebase when you @codebase with a cross-cutting refactor. The rule of thumb: keep any single AI task scoped to fewer than five files and fewer than 2,000 lines of changed surface area. Larger jobs get decomposed, not batched.
Can I just use a bigger model to fix this?+
A larger context window helps at the margin but does not solve the underlying attention-decay problem. A 1M-token model can technically read your whole codebase; it cannot reliably attend to every piece of it at once. Bigger models also cost more and run slower, so the economics punish you for throwing context at problems that scope would solve. The actual fix is workflow: memory files, tight scope, session checkpoints. Bigger models amplify a good workflow; they do not replace it.
Next step

Ship the fix. Keep the fix.

Emergency Triage restores service in 48 hours. Break the Fix Loop rebuilds CI so this error cannot ship again.

About the author

Hyder Shah leads Afterbuild Labs, shipping production rescues for apps built in Lovable, Bolt.new, Cursor, Replit, v0, and Base44. our rescue methodology.

Sources