Question 1

What is "The transformer block, assembled & stacked" about?

Accepted Answer

attention + FFN + residual/norm = a block; stack N blocks (nano-GPT).

Question 2

What problem does it solve?

Accepted Answer

How do the parts fit into the whole machine?

Question 3

What will I be able to do after this lesson?

Accepted Answer

You can describe a whole transformer block and how stacking it predicts the next word.

Question 4

What comes next?

Accepted Answer

The model outputs a distribution: how do we pick a word?