Query, Key, Value
The idea: Each token emits a Query, a Key, and a Value; match Q to K by dot product.
What you'll be able to do: You can explain Query, Key, Value and that attention reuses the dot product.
The problem it solves: How does a token decide what to look at?
Builds on: Dot-product similarity, The context wall
← The context wall · Next: Attention scores, softmax & the weighted sum →
All lessons