How attention blends meaning
Q·K → scores → softmax → weighted sum of Values.
1New job for the same sentence: watch raw scores become one blend that adds to 100%.your turn
raw scores: any size
the
0.4
cat
2.7
drinks
1.3
milk
1.5
softmax ↓
one blend: always 100%
cat 61%
drinks 15%
milk 18%
These are the Q·K match scores from 3.2, any size, just a pile of votes.
1/3
the ideaSame sentence “the cat drinks milk”, new job: each word now rebuilds its meaning from the words around it. Softmax turns raw scores into one blend that always adds to 100%.
Why this matters
A word that knows its full context is exactly what lets the model predict what comes next. (Next lesson: why a word isn't allowed to peek at the words that come after it.)
→ continue← backR replay
During training the model could cheat by seeing the future…
3.4 Why it can't peek aheadBuilds on3.2Query, Key, Value
Common questions
What is "How attention blends meaning" about?
Q·K → scores → softmax → weighted sum of Values.
What problem does it solve?
How do raw scores become a blend of meanings?
What will I be able to do after this lesson?
You can explain attention end to end: Q·K → softmax → weighted sum of Values.
What comes next?
During training the model could cheat by seeing the future…