Skip to content
See How AI Works
all lessons3.3●●●○○

How attention blends meaning

Q·K → scores → softmax → weighted sum of Values.

1New job for the same sentence: watch raw scores become one blend that adds to 100%.your turn

raw scores: any size

the
0.4
cat
2.7
drinks
1.3
milk
1.5

softmax ↓

one blend: always 100%

cat 61%
drinks 15%
milk 18%

These are the Q·K match scores from 3.2, any size, just a pile of votes.

1/3

the ideaSame sentence “the cat drinks milk”, new job: each word now rebuilds its meaning from the words around it. Softmax turns raw scores into one blend that always adds to 100%.

Why this matters

A word that knows its full context is exactly what lets the model predict what comes next. (Next lesson: why a word isn't allowed to peek at the words that come after it.)

continue backR replay

During training the model could cheat by seeing the future…

3.4 Why it can't peek ahead

Builds on3.2Query, Key, Value

Architecture·

Common questions

What is "How attention blends meaning" about?
Q·K → scores → softmax → weighted sum of Values.
What problem does it solve?
How do raw scores become a blend of meanings?
What will I be able to do after this lesson?
You can explain attention end to end: Q·K → softmax → weighted sum of Values.
What comes next?
During training the model could cheat by seeing the future…