The context wall
The idea: Tokens must look at other tokens.
What you'll be able to do: You can explain why a model must let words look at other words (attention).
The problem it solves: To finish 'The cat drinks ___', a token must look back at 'cat'. Bigrams see only the previous word.
Builds on: The bigram model
← Where the map comes from · Next: Query, Key, Value →
All lessons