Why context changes everything
Tokens must look at other tokens.
1A bigram sees only “drinks”. Which of these could fill the blank?
the cat drinks ___
drinks → ?(all a bigram sees)
watercoffeefastmilk ✓
the catchmilk is right, but only if you know a cat is drinking, and that word sits two words back, out of a bigram's reach.
→ continue← backR replay
How does a token decide what to look at?
3.2 Query, Key, ValueBuilds on2.2A first guess: just the last word
Common questions
What is "Why context changes everything" about?
Tokens must look at other tokens.
What problem does it solve?
To finish 'The cat drinks ___', a token must look back at 'cat'. Bigrams see only the previous word.
What will I be able to do after this lesson?
You can explain why a model must let words look at other words (attention).
What comes next?
How does a token decide what to look at?