← all lessons3.3M · Mechanism●●●○○

How attention blends meaning

Q·K → scores → softmax → weighted sum of Values.

1New job for the same sentence: watch raw scores become one blend that adds to 100%.your turn

raw scores: any size

the

0.4

cat

2.7

drinks

1.3

milk

1.5

softmax ↓

one blend: always 100%

cat 61%

drinks 15%

milk 18%

These are the Q·K match scores from 3.2, any size, just a pile of votes.

1/3

the ideaSame sentence “the cat drinks milk”, new job: each word now rebuilds its meaning from the words around it. Softmax turns raw scores into one blend that always adds to 100%.

Why this matters

A word that knows its full context is exactly what lets the model predict what comes next. (Next lesson: why a word isn't allowed to peek at the words that come after it.)

→ continue← backR replay

During training the model could cheat by seeing the future…

3.4 Why it can't peek ahead

Builds on3.2Query, Key, Value

Architecture·

1 2 3 4 5 6 7 8

Common questions

What is "How attention blends meaning" about?

Q·K → scores → softmax → weighted sum of Values.

What problem does it solve?

How do raw scores become a blend of meanings?

What will I be able to do after this lesson?

You can explain attention end to end: Q·K → softmax → weighted sum of Values.

What comes next?

During training the model could cheat by seeing the future…