The hidden O(N²) tax in AI agent loops — measured, with a benchmark you can run
· by the Architect · ~5 min read · for developers running long agent sessions
Every turn, most AI agents re-send their entire transcript. Across a real multi-session task that costs 62.8%–85.9% more context tokens than recalling a compact memory instead. Here is the measurement, the method, and how to reproduce it offline.
The cost nobody puts on the invoice
An agent loop is not one model call — it is dozens. A long Claude Code or Cursor session, an autonomous task runner, a multi-day project: each turn is a fresh call that re-sends the system prompt, the entire growing transcript, and the new message. The transcript only grows, so the context you pay for grows with it — and because every turn re-sends everything before it, total context spend scales roughly O(N²) across N turns. It is also why long sessions eventually hit the context window and fall over.
There is an alternative: do not re-send the transcript. Keep durable facts — decisions, conventions, file paths — as memory cells, and recall a small, bounded set each turn. That turns the quadratic resend into roughly O(N · cap). The obvious question is how much does that actually save? So SAIHM published a benchmark to measure it — and to let you check the number rather than trust it.
The experiment
The benchmark (citw2/saihm-token-benchmark, Apache-2.0) models one realistic scenario: a build-a-feature coding assistant working across three sittings, where early decisions (“use Recharts”, “store timestamps in UTC”, “named exports only”) accumulate and later turns need to recall them. It counts input/context tokens only, summed across every turn, under two strategies:
- Naive — each turn sends system prompt + the entire growing transcript + the new message.
- SAIHM — each turn sends system prompt + a capped set of recalled memory cells + the new message. The raw transcript is never re-sent.
Tokenization is gpt-tokenizer (cl100k_base, the GPT-4 BPE). It runs fully offline — no API calls, no keys — so it is deterministic and anyone gets the same result.
The numbers
| Session length | Naive tokens | SAIHM tokens | Fewer |
|---|---|---|---|
| 5 turns | 1,628 | 605 | 62.8% |
| 10 turns | 6,091 | 1,273 | 79.1% |
| 15 turns | 13,175 | 2,023 | 84.6% |
| 18 turns | 18,688 | 2,632 | 85.9% |
The longer the session, the wider the gap — exactly what the O(N²)-vs-O(N · cap) difference predicts.
Why these numbers are honest, not cherry-picked
- Input only. Output tokens are identical under both strategies, so they are not counted. The win is purely on the context you re-send.
- It is conservative for short work. At 5 turns you save ~63%, not 86%. The savings are a function of session length and how compact your memory cells are — your real mileage depends on your workload.
- It measures a dynamic, not a price. This is resend-vs-recall token volume, not any one provider’s billing.
Reproduce it in two minutes
git clone https://github.com/citw2/saihm-token-benchmark
cd saihm-token-benchmark && npm install
node benchmark.mjs
node benchmark.mjs --recall-cap 4 # trade recall breadth vs savings
Change the cap, swap in your own scenario, re-run. The point of publishing it is that you do not have to take the percentage on faith.
Where the recall comes from
SAIHM is a memory layer you address across models — the same store works from Claude, GPT, DeepSeek, Qwen, Kimi or GLM, and through LangChain/LlamaIndex. Durable facts live as memory cells; each turn pulls a bounded set instead of replaying history. Because the memory is portable, you are not locked to one vendor’s built-in context; because it is yours, you hold the keys and erasure is per-record and provable. There are runnable, one-command demos for each of the above — linked from the demo set.
The honest close
SAIHM is a paid product, with no free tier — that is stated up front rather than buried behind a trial. But the benchmark and all nine demos are open source and run locally, so you can verify the claim and try the integration before deciding anything. The tool surface and connect steps are at /developers; pricing is at /pricing.
— Architect
Independence notice. SAIHM is an Apache-2.0 protocol authored independently. The benchmark described here is open source and reproducible offline; the figures are produced by the published script and depend on session length and scenario. The architecture is described at a conceptual level; the authoritative details are the open specification and the published source.