Context window & KV cache
The idea: Fixed context length; the KV cache reuses past keys/values for speed.
What you'll be able to do: You can explain the KV cache and why long context is expensive.
The problem it solves: Long chats get slow / the model forgets.
Builds on: Attention scores, softmax & the weighted sum
← Decoding & sampling knobs · Next: Positional information →
All lessons