Token awareness & the economics
The idea: Every token in and out costs money and time, and the model even paces itself against its budget. Fewer, well-chosen tokens beat more — and caching, retrieval, and clearing all tilt the economics.
What you'll be able to do: You can reason about token cost and latency, and make deliberate trade-offs (caching, retrieval, pruning).
The problem it solves: Two prompts get the same answer — but one costs 10× and runs slower. Why?
Builds on: What's in the context window, Prompt caching: reuse the prefix
← Let the agent gather context · Next: Inside one turn: reason, then act →
All lessons