Prompt caching: reuse the prefix
The idea: Cache the stable prefix: the model processes it once, then reuses it for a fraction of the cost and latency — as long as the start of the prompt is byte-for-byte unchanged.
What you'll be able to do: You can explain prompt caching and why putting stable content first cuts cost and latency.
The problem it solves: You re-send the same long system prompt and files every single turn. Isn't that wasteful?
Builds on: Context window & KV cache, What's in the context window
← Managing the context · Next: Tools, permissions & trust →
All lessons