How the model knows word order
Inject position (sinusoidal / RoPE intuition).
1Attention compares words, but does it know their order?
cat
drinks
milk
milk
drinks
cat
the catchA dot product ignores where each word sits, so attention sees both orders as the same word-soup.
β continueβ backR replay
Does bigger always mean better?
4.4 Scaling lawsBuilds on3.3How attention blends meaning
Common questions
What is "How the model knows word order" about?
Inject position (sinusoidal / RoPE intuition).
What problem does it solve?
Attention is order-blind: 'cat drinks milk' = 'milk drinks cat'.
What will I be able to do after this lesson?
You can explain why models add positional information so word order matters.
What comes next?
Does bigger always mean better?