Skip to content
See How AI Works
all lessons3.4●●○○○

Why it can't peek ahead

Mask the upper triangle to −∞ before softmax.

1When the model writes, it predicts one word at a time, so as it decides what comes after “drinks”, the word that will come next (“milk”) hasn’t been written yet. Which words must “drinks” be stopped from reading?your turn

Tap the words drinks must not peek at:

1 to find
continue backR replay

One head sees one kind of relationship: what about others?

3.5 Multi-head attention
Architecture·

Common questions

What is "Why it can't peek ahead" about?
Mask the upper triangle to −∞ before softmax.
What problem does it solve?
The model could peek at future words during training.
What will I be able to do after this lesson?
You can explain causal masking: a word sees only itself and earlier words.
What comes next?
One head sees one kind of relationship: what about others?