Skip to content
See How AI Works
all lessons3.9●●●○○elective

Boss: repair broken attention

Each focus token's attention beam points at the wrong word. Reroute it to the word it really depends on, even when the break is a peek at the future.

1This token's attention points at the wrong word. Tap the word it should attend to.your turn
break
1/3
it

it is attending to the wrong word.

it splits 100% of its attention across the words it can see:

The
5%
cat
5%
chased
5%
the
5%
mouse
62%
because
5%
it
12%
was
masked
hungry
masked

“it” is routed to the wrong noun. Which earlier word is actually hungry?

continue backR replay

Attention is wired and stacked. Now the model has a ranked list of next words, so how does it pick one?

4.1 Tuning the model's creativity
Architecture·

Common questions

What is "Boss: repair broken attention" about?
Each focus token's attention beam points at the wrong word. Reroute it to the word it really depends on, even when the break is a peek at the future.
What problem does it solve?
You've watched tokens decide what to look at. Could you fix one that looks at the wrong word?
What will I be able to do after this lesson?
You can spot a misrouted attention beam, pick the token it should attend to from the Q·K match, and respect the causal mask.
What comes next?
Attention is wired and stacked. Now the model has a ranked list of next words, so how does it pick one?