Cut your OpenAI bill by 60–95%. Plan in 72 hours.
We audit your token spend, propose prompt caching + batch API + model routing + response caching, and write the patch plan. Pays for itself in 1–2 months or your money back.
AI Cost Audit — a 72-hour engagement that ingests your provider usage data, isolates where every token-dollar goes by endpoint and model, and writes a ranked patch plan across four levers: prompt caching (Anthropic cache_control and OpenAI prefix caching), Batch API migration (50% off list price for non-urgent workloads), model routing (cheap model for easy calls, flagship for the hard ones), and response caching (semantic Redis cache for repeated queries). Every recommendation ships with projected monthly saving, confidence score, and a fixed-price implementation quote if you want us to ship the fixes. Pays for itself in 1–2 months or your money back.
Six cost patterns the audit isolates.
Every audit we run finds some combination of these six. The written plan names which patterns are in your stack, the projected saving per pattern, and the fixed-price implementation quote for each.
| Symptom | Root cause | How the plan fixes it |
|---|---|---|
| $10k+/month on OpenAI or Anthropic with no cost dashboards | Every prompt runs at list price, no caching blocks, no batch routing | Prompt caching + batch API audit — usually 60–80% off within two weeks |
| Opus / GPT-5 used for tasks Haiku / 4o-mini could handle | No classifier in front of the router; every call hits the flagship model | Model routing recommendations — cheap model for easy calls, flagship for the hard ones |
| Huge repeated system prompts on every request (RAG, agents, eval) | Cache_control blocks never set; prefix caching unused on OpenAI | Prompt caching plan — 90% cost reduction on cached segments, typically full payback in weeks |
| Nightly / hourly batch jobs on the synchronous API | Batch API unused despite 50% off list price for non-urgent workloads | Batch API migration plan — same throughput, half the cost, same model quality |
| The same question asked 10× a day across users | No semantic response cache; identical queries re-run end-to-end | Response caching recommendation (Redis + embedding hash) with TTL tuned to your use case |
| Cost running hotter than revenue; CFO flagging the line item | No per-endpoint or per-customer unit economics visible | Unit-economics dashboard spec — cost per user, per feature, per $ of revenue, wired to your BI |
The 72-hour cost-audit schedule.
From read-only dashboard access to written patch plan in 72 hours: one forensics pass, one caching + batch pass, one routing + response-cache pass, one written plan delivered.
- D1day 1
Ingest + forensics
Day 1 we ingest read-only access to your provider dashboards (OpenAI usage + billing, Anthropic Console, Bedrock, Vertex) plus any custom logs you ship to Datadog, Helicone, LangSmith, or CloudWatch. We break down spend by endpoint, model, and time-of-day. By end of day 1 you have a line-item report: where every dollar went last month.
- D2day 2
Prompt caching + batch analysis
Day 2 is the caching pass — we identify every repeated prefix (system prompts, few-shot examples, RAG context headers) that qualifies for Anthropic cache_control or OpenAI prefix caching, and every async workload that should be on the Batch API. Each candidate gets a projected saving, confidence score, and implementation effort estimate.
- D3day 3
Routing + caching + written plan
Day 3 we map the model-routing opportunities (Haiku / GPT-4o-mini for classification, extraction, simple summarization; Opus / GPT-5 only for the calls that need it) and the response-caching opportunities (semantic Redis cache with embedding hash). Then we write the plan: every recommendation with ROI, priority, and fixed-price implementation quote if you want us to build it.
- D1day 1
Ingest + forensics
Day 1 we ingest read-only access to your provider dashboards (OpenAI usage + billing, Anthropic Console, Bedrock, Vertex) plus any custom logs you ship to Datadog, Helicone, LangSmith, or CloudWatch. We break down spend by endpoint, model, and time-of-day. By end of day 1 you have a line-item report: where every dollar went last month.
- D2day 2
Prompt caching + batch analysis
Day 2 is the caching pass — we identify every repeated prefix (system prompts, few-shot examples, RAG context headers) that qualifies for Anthropic cache_control or OpenAI prefix caching, and every async workload that should be on the Batch API. Each candidate gets a projected saving, confidence score, and implementation effort estimate.
- D3day 3
Routing + caching + written plan
Day 3 we map the model-routing opportunities (Haiku / GPT-4o-mini for classification, extraction, simple summarization; Opus / GPT-5 only for the calls that need it) and the response-caching opportunities (semantic Redis cache with embedding hash). Then we write the plan: every recommendation with ROI, priority, and fixed-price implementation quote if you want us to build it.
A sample excerpt from the written cost-audit plan.
This is an anonymized excerpt from a cost audit we shipped last month. Spend forensics first, top 5 fixes ranked by ROI second, projected monthly after-fix spend third, fixed-price implementation quote fourth, free same-day remediations fifth. For reference, see the Anthropic prompt caching and OpenAI Batch API docs we cross-check.
01# AI Cost Audit · example-saas02> Delivered 2026-04-18 · 72 hours after intake · Afterbuild Labs03 04## 1. Spend forensics · last 30 days05Total LLM spend: $42,180 · Revenue attributed to LLM features: $63,40006Margin on LLM: 33.5% (target after fixes: 78%)07 08| Endpoint | Model | Calls | Spend | % of total |09| ---------------------------- | -------------- | -------- | -------- | ---------- |10| /api/support/triage | Opus 4.7 | 18,400 | $14,220 | 33.7% |11| /api/rag/answer | Opus 4.7 | 41,200 | $11,900 | 28.2% |12| /jobs/nightly-summaries | GPT-5 | 6,800 | $7,100 | 16.8% |13| /api/internal/classify | Opus 4.7 | 92,100 | $5,900 | 14.0% |14| everything else | mixed | — | $3,060 | 7.3% |15 16## 2. Top 5 fixes (ranked by ROI)171. Prompt caching on /api/rag/answer system prompt (12KB, fires on every call) — est. save $10,100/mo182. Move /api/internal/classify from Opus to Haiku (task is 3-class routing) — est. save $5,700/mo193. Move /jobs/nightly-summaries to Batch API (non-urgent, 50% off) — est. save $3,500/mo204. Response cache on /api/rag/answer for top-200 repeat queries — est. save $2,800/mo215. Prompt caching on /api/support/triage agent scaffolding — est. save $2,400/mo22 23## 3. Projected after-fix monthly spend24Current: $42,180 · Projected: $17,680 · Savings: $24,500/mo (58%)25ROI on $2,499 audit: payback in 3.1 days of shipped fixes.26 27## 4. Patch plan — fixed-price implementation28Phase 1 (week 1): caching on rag + support ($4,999, saves $12,500/mo)29Phase 2 (week 2): Haiku routing + batch API ($3,999, saves $9,200/mo)30Phase 3 (week 3): response cache + observability ($3,499, saves $2,800/mo)31Total implementation cost: $12,497 · Projected annual savings: $294,00032 33## 5. What you can do yourself today (free)341. Set a daily hard-cap on the provider dashboard before the next billing cycle.352. Turn on Anthropic prompt caching on your largest system prompt (90% cheaper reads).363. Move one nightly cron to the Batch API this week — copy-paste change, 50% off.What the AI Cost Audit delivers.
Five deliverables. Fixed fee, no retainer. The plan is yours regardless of whether you hire us for implementation.
- 01Token spend forensics — where every dollar goes, by endpoint and model
- 02Prompt caching candidates (Anthropic cache_control blocks, OpenAI prefix caching)
- 03Batch API + async job candidates (50% off list price on non-urgent calls)
- 04Model routing recommendations (Haiku / 4o-mini for the easy calls, Opus / GPT-5 for the hard ones)
- 05Patch plan — fixed-price implementation estimate + ROI projection
Fixed fee. Money-back if payback exceeds 2 months.
One audit per provider setup. If you come back after shipping fixes and want a second-pass audit, the second audit runs at $1,499. Implementation is always priced separately — no retainer pressure, no hourly meter.
- turnaround
- 3 days
- scope
- Spend forensics · caching + batch + routing + response-cache plan · fixed-price implementation quote
- guarantee
- Payback in 1–2 months or your money back. Plan is yours regardless.
AI Cost Audit vs doing nothing vs DIY.
Four dimensions. The lime column is what you get when you bring in an engineering team that has run caching, batch, and routing migrations across dozens of production LLM stacks.
| Dimension | Doing nothing | Vendor renegotiation | DIY without instrumentation | Afterbuild Labs audit |
|---|---|---|---|---|
| Approach | Doing nothing | Renegotiate with vendor | DIY without instrumentation | Afterbuild Labs AI Cost Audit — $2,499, 72 hours |
| Typical saving | $0 — bill keeps growing | 5–15% at best (weak leverage under $100K/mo) | 10–25% if you get lucky; often regressions | 60–95% on the audited endpoints |
| Time to answer | Never | 2–6 weeks of procurement back-and-forth | Weeks of engineering time across teams | 72 hours from intake to written plan |
| Deliverable | — | A modest discount code | A Slack thread of hypotheses | Written plan · ROI per fix · fixed-price implementation quote |
Who should book the AI Cost Audit (and who should skip it).
Book the audit if…
- →Your monthly OpenAI / Anthropic / Gemini / Bedrock bill is over $8K and climbing.
- →You have not yet enabled prompt caching (cache_control or prefix caching) on your largest system prompts.
- →You suspect Opus or GPT-5 is running on tasks Haiku or GPT-4o-mini could handle.
- →Your CFO is asking for a cost-per-user or cost-per-revenue-dollar number on your LLM features.
- →You run nightly / hourly batch jobs against the synchronous API instead of the Batch API.
Do not book the audit if…
- →Your monthly LLM spend is under $2K — the audit will not pay for itself at that scale.
- →You have not yet shipped an LLM feature to production — book AI Readiness Audit (free) instead.
- →You need implementation, not a plan — book an Integration Fix or API Integration Sprint directly.
- →Your spend is driven by fine-tuning or dedicated capacity contracts — those require a separate procurement-focused engagement.
- →You want a vendor-neutral RFP assessment — this is a technical audit, not a procurement engagement.
Engineers who run the cost audit.
The audit maps spend to levers and routes implementation to the right specialist. Most audits end with one of these three running the follow-on phase.
Runs the 72-hour forensics — breaks down spend by endpoint and model, isolates the caching, routing, and batch candidates, and writes the patch plan.
02 · Response cachingDesigns the semantic response cache — Redis + embedding-hash keys, TTL tuning, and invalidation rules so cached responses stay fresh without regressing quality.
03 · ObservabilityWires the cost-per-request telemetry to your logging and BI stack — so you can watch savings land in real time after the fixes ship.
AI Cost Audit — your questions, answered.
Stop burning tokens. Book the AI Cost Audit.
72 hours. Fixed $2,499. A written patch plan across caching, batch, routing, and response-cache fixes — with projected saving per line item and fixed-price implementation quote. Pays for itself in 1–2 months or your money back.
Book free diagnostic →