§ S-02/ai-cost-audit

Cut your OpenAI bill by 60–95%. Plan in 72 hours.

We audit your token spend, propose prompt caching + batch API + model routing + response caching, and write the patch plan. Pays for itself in 1–2 months or your money back.

$2,499fixed fee

3 daysto written plan

60–95%typical reduction on audited endpoints

book the audit █not sure yet? free readiness audit →

Quick verdict

AI Cost Audit — a 72-hour engagement that ingests your provider usage data, isolates where every token-dollar goes by endpoint and model, and writes a ranked patch plan across four levers: prompt caching (Anthropic cache_control and OpenAI prefix caching), Batch API migration (50% off list price for non-urgent workloads), model routing (cheap model for easy calls, flagship for the hard ones), and response caching (semantic Redis cache for repeated queries). Every recommendation ships with projected monthly saving, confidence score, and a fixed-price implementation quote if you want us to ship the fixes. Pays for itself in 1–2 months or your money back.

§ 01/cost-patterns

Six cost patterns the audit isolates.

Every audit we run finds some combination of these six. The written plan names which patterns are in your stack, the projected saving per pattern, and the fixed-price implementation quote for each.

Audit rubric · caching · routing · batch · response cache · unit economics

Symptom	Root cause	How the plan fixes it
$10k+/month on OpenAI or Anthropic with no cost dashboards	Every prompt runs at list price, no caching blocks, no batch routing	Prompt caching + batch API audit — usually 60–80% off within two weeks
Opus / GPT-5 used for tasks Haiku / 4o-mini could handle	No classifier in front of the router; every call hits the flagship model	Model routing recommendations — cheap model for easy calls, flagship for the hard ones
Huge repeated system prompts on every request (RAG, agents, eval)	Cache_control blocks never set; prefix caching unused on OpenAI	Prompt caching plan — 90% cost reduction on cached segments, typically full payback in weeks
Nightly / hourly batch jobs on the synchronous API	Batch API unused despite 50% off list price for non-urgent workloads	Batch API migration plan — same throughput, half the cost, same model quality
The same question asked 10× a day across users	No semantic response cache; identical queries re-run end-to-end	Response caching recommendation (Redis + embedding hash) with TTL tuned to your use case
Cost running hotter than revenue; CFO flagging the line item	No per-endpoint or per-customer unit economics visible	Unit-economics dashboard spec — cost per user, per feature, per $ of revenue, wired to your BI

§ 02/72-hour-schedule

The 72-hour cost-audit schedule.

From read-only dashboard access to written patch plan in 72 hours: one forensics pass, one caching + batch pass, one routing + response-cache pass, one written plan delivered.

D1day 1
Ingest + forensics
Day 1 we ingest read-only access to your provider dashboards (OpenAI usage + billing, Anthropic Console, Bedrock, Vertex) plus any custom logs you ship to Datadog, Helicone, LangSmith, or CloudWatch. We break down spend by endpoint, model, and time-of-day. By end of day 1 you have a line-item report: where every dollar went last month.
D2day 2
Prompt caching + batch analysis
Day 2 is the caching pass — we identify every repeated prefix (system prompts, few-shot examples, RAG context headers) that qualifies for Anthropic cache_control or OpenAI prefix caching, and every async workload that should be on the Batch API. Each candidate gets a projected saving, confidence score, and implementation effort estimate.
D3day 3
Routing + caching + written plan
Day 3 we map the model-routing opportunities (Haiku / GPT-4o-mini for classification, extraction, simple summarization; Opus / GPT-5 only for the calls that need it) and the response-caching opportunities (semantic Redis cache with embedding hash). Then we write the plan: every recommendation with ROI, priority, and fixed-price implementation quote if you want us to build it.

D1day 1
Ingest + forensics
Day 1 we ingest read-only access to your provider dashboards (OpenAI usage + billing, Anthropic Console, Bedrock, Vertex) plus any custom logs you ship to Datadog, Helicone, LangSmith, or CloudWatch. We break down spend by endpoint, model, and time-of-day. By end of day 1 you have a line-item report: where every dollar went last month.
D2day 2
Prompt caching + batch analysis
Day 2 is the caching pass — we identify every repeated prefix (system prompts, few-shot examples, RAG context headers) that qualifies for Anthropic cache_control or OpenAI prefix caching, and every async workload that should be on the Batch API. Each candidate gets a projected saving, confidence score, and implementation effort estimate.
D3day 3
Routing + caching + written plan
Day 3 we map the model-routing opportunities (Haiku / GPT-4o-mini for classification, extraction, simple summarization; Opus / GPT-5 only for the calls that need it) and the response-caching opportunities (semantic Redis cache with embedding hash). Then we write the plan: every recommendation with ROI, priority, and fixed-price implementation quote if you want us to build it.

§ 03/sample-plan

A sample excerpt from the written cost-audit plan.

This is an anonymized excerpt from a cost audit we shipped last month. Spend forensics first, top 5 fixes ranked by ROI second, projected monthly after-fix spend third, fixed-price implementation quote fourth, free same-day remediations fifth. For reference, see the Anthropic prompt caching and OpenAI Batch API docs we cross-check.

cost-audit-plan.md

markdown

01# AI Cost Audit · example-saas02> Delivered 2026-04-18 · 72 hours after intake · Afterbuild Labs03 04## 1. Spend forensics · last 30 days05Total LLM spend: $42,180 · Revenue attributed to LLM features: $63,40006Margin on LLM: 33.5% (target after fixes: 78%)07 08| Endpoint                     | Model          | Calls    | Spend    | % of total |09| ---------------------------- | -------------- | -------- | -------- | ---------- |10| /api/support/triage          | Opus 4.7       | 18,400   | $14,220  | 33.7%      |11| /api/rag/answer              | Opus 4.7       | 41,200   | $11,900  | 28.2%      |12| /jobs/nightly-summaries      | GPT-5          |  6,800   |  $7,100  | 16.8%      |13| /api/internal/classify       | Opus 4.7       | 92,100   |  $5,900  | 14.0%      |14| everything else              | mixed          |   —      |  $3,060  |  7.3%      |15 16## 2. Top 5 fixes (ranked by ROI)171. Prompt caching on /api/rag/answer system prompt (12KB, fires on every call) — est. save $10,100/mo182. Move /api/internal/classify from Opus to Haiku (task is 3-class routing) — est. save $5,700/mo193. Move /jobs/nightly-summaries to Batch API (non-urgent, 50% off) — est. save $3,500/mo204. Response cache on /api/rag/answer for top-200 repeat queries — est. save $2,800/mo215. Prompt caching on /api/support/triage agent scaffolding — est. save $2,400/mo22 23## 3. Projected after-fix monthly spend24Current: $42,180 · Projected: $17,680 · Savings: $24,500/mo (58%)25ROI on $2,499 audit: payback in 3.1 days of shipped fixes.26 27## 4. Patch plan — fixed-price implementation28Phase 1 (week 1): caching on rag + support ($4,999, saves $12,500/mo)29Phase 2 (week 2): Haiku routing + batch API ($3,999, saves $9,200/mo)30Phase 3 (week 3): response cache + observability ($3,499, saves $2,800/mo)31Total implementation cost: $12,497 · Projected annual savings: $294,00032 33## 5. What you can do yourself today (free)341. Set a daily hard-cap on the provider dashboard before the next billing cycle.352. Turn on Anthropic prompt caching on your largest system prompt (90% cheaper reads).363. Move one nightly cron to the Batch API this week — copy-paste change, 50% off.

The same format lands in your inbox within 72 hours of intake — markdown you can share with your CFO, CTO, or board.

§ 04/ledger

What the AI Cost Audit delivers.

Five deliverables. Fixed fee, no retainer. The plan is yours regardless of whether you hire us for implementation.

01Token spend forensics — where every dollar goes, by endpoint and model
02Prompt caching candidates (Anthropic cache_control blocks, OpenAI prefix caching)
03Batch API + async job candidates (50% off list price on non-urgent calls)
04Model routing recommendations (Haiku / 4o-mini for the easy calls, Opus / GPT-5 for the hard ones)
05Patch plan — fixed-price implementation estimate + ROI projection

§ 05/engagement-price

Fixed fee. Money-back if payback exceeds 2 months.

One audit per provider setup. If you come back after shipping fixes and want a second-pass audit, the second audit runs at $1,499. Implementation is always priced separately — no retainer pressure, no hourly meter.

cost

price

$2,499

turnaround: 3 days
scope: Spend forensics · caching + batch + routing + response-cache plan · fixed-price implementation quote
guarantee: Payback in 1–2 months or your money back. Plan is yours regardless.

book the audit→

§ 06/vs-alternatives

AI Cost Audit vs doing nothing vs DIY.

Four dimensions. The lime column is what you get when you bring in an engineering team that has run caching, batch, and routing migrations across dozens of production LLM stacks.

AI Cost Audit · caching · routing · batch API · money-back payback guarantee

Dimension	Doing nothing	Vendor renegotiation	DIY without instrumentation	Afterbuild Labs audit
Approach	Doing nothing	Renegotiate with vendor	DIY without instrumentation	Afterbuild Labs AI Cost Audit — $2,499, 72 hours
Typical saving	$0 — bill keeps growing	5–15% at best (weak leverage under $100K/mo)	10–25% if you get lucky; often regressions	60–95% on the audited endpoints
Time to answer	Never	2–6 weeks of procurement back-and-forth	Weeks of engineering time across teams	72 hours from intake to written plan
Deliverable	—	A modest discount code	A Slack thread of hypotheses	Written plan · ROI per fix · fixed-price implementation quote

§ 07/fit-check

Who should book the AI Cost Audit (and who should skip it).

Book the audit if…

→Your monthly OpenAI / Anthropic / Gemini / Bedrock bill is over $8K and climbing.
→You have not yet enabled prompt caching (cache_control or prefix caching) on your largest system prompts.
→You suspect Opus or GPT-5 is running on tasks Haiku or GPT-4o-mini could handle.
→Your CFO is asking for a cost-per-user or cost-per-revenue-dollar number on your LLM features.
→You run nightly / hourly batch jobs against the synchronous API instead of the Batch API.

Do not book the audit if…

→Your monthly LLM spend is under $2K — the audit will not pay for itself at that scale.
→You have not yet shipped an LLM feature to production — book AI Readiness Audit (free) instead.
→You need implementation, not a plan — book an Integration Fix or API Integration Sprint directly.
→Your spend is driven by fine-tuning or dedicated capacity contracts — those require a separate procurement-focused engagement.
→You want a vendor-neutral RFP assessment — this is a technical audit, not a procurement engagement.

§ 08/audit-engineers

Engineers who run the cost audit.

The audit maps spend to levers and routes implementation to the right specialist. Most audits end with one of these three running the follow-on phase.

01 · Forensics pass

Code audit specialist

Runs the 72-hour forensics — breaks down spend by endpoint and model, isolates the caching, routing, and batch candidates, and writes the patch plan.

02 · Response caching

Database optimization expert

Designs the semantic response cache — Redis + embedding-hash keys, TTL tuning, and invalidation rules so cached responses stay fresh without regressing quality.

03 · Observability

React debugging expert

Wires the cost-per-request telemetry to your logging and BI stack — so you can watch savings land in real time after the fixes ship.

§ 09/audit-faq

AI Cost Audit — your questions, answered.

FAQ

Who is the AI Cost Audit for?

CFOs, CTOs, heads of engineering, and founders with a burning OpenAI / Anthropic / Gemini bill. Typical client spends $8K–$250K/month on LLM APIs and wants the bill under control before the next board meeting. If you spend under $2K/month on LLM APIs today the audit will not pay for itself — wait until your bill is bigger before booking.

What does '60–95% cost reduction' actually mean in practice?

Those are the realistic ceilings on audited endpoints after caching, routing, and batch API fixes — not a blanket promise across your entire bill. A typical engagement cuts total LLM spend by 40–70% once all recommended fixes are implemented. We quote a conservative projected saving per line item in the written plan, with confidence scores, so you know exactly what to expect before committing to implementation.

Do you implement the fixes or just hand over the plan?

The audit is plan-only — $2,499 gets you the written forensics, recommendations, and fixed-price implementation quote. Implementation is a separate engagement, priced by phase (typically $3,999–$12,999 depending on scope). About 70% of audits convert to an implementation engagement with us; the other 30% hand the plan to their in-house team. The plan is yours either way.

What access do you need to run the audit?

Read-only access to your provider usage dashboards (OpenAI Platform, Anthropic Console, Google AI Studio, Bedrock, Vertex), plus any logging you already ship (Datadog, Helicone, LangSmith, Langfuse, CloudWatch, custom). No code access required for the forensics; if you want the plan to include code-level caching recommendations, a short look at your prompt templates helps. All access can be revoked the moment the plan is delivered.

Can I get these savings without switching to Anthropic? We're all-in on OpenAI.

Yes. OpenAI has prefix caching, Batch API (50% off), structured output tiers, and model-routing opportunities (GPT-5 to GPT-5-mini to GPT-4o-mini) that capture the majority of the savings. Anthropic's cache_control is sharper on caching, but OpenAI-only audits still typically hit 40–70% total reduction. Gemini and open-source models (via Bedrock or self-hosted) expand the router further if you want that.

Is the audit HIPAA or SOC 2 compatible?

Yes. We sign BAAs for HIPAA work and have a SOC 2-aligned handling process for audit data — read-only access, no data leaves the provider dashboards, recommendations are written against metadata, not payloads. For HIPAA specifically, we never read PHI; we work from aggregated metrics and your own redacted samples. Add-ons: a formal BAA, and an encrypted delivery channel for the written plan.

§ 10/adjacent-services

Paid integrate engagements

Related audits

Next step

Stop burning tokens. Book the AI Cost Audit.

72 hours. Fixed $2,499. A written patch plan across caching, batch, routing, and response-cache fixes — with projected saving per line item and fixed-price implementation quote. Pays for itself in 1–2 months or your money back.

Book free diagnostic →