Skip to content
See How AI Works
← all lessons3.7●●●○○

Keeping the signal alive

Residual = keep a copy and add the change; LayerNorm = rescale to stay sane.

1Skip path off: the signal climbs through every layer. Predict where it ends up.your turn
inputoutput βœ—AttentionFeed-forwardAttention

After dozens of these rewrites, what's left of the original meaning?

β†’ continue← backR replay

Assemble the parts into one block, then stack them.

3.8 The transformer block, assembled & stacked
ArchitectureΒ·

Common questions

What is "Keeping the signal alive" about?
Residual = keep a copy and add the change; LayerNorm = rescale to stay sane.
What problem does it solve?
Deep stacks forget the original signal or blow up.
What will I be able to do after this lesson?
You can explain residuals and LayerNorm: the tricks that make deep nets trainable.
What comes next?
Assemble the parts into one block, then stack them.