Keeping the signal alive
Residual = keep a copy and add the change; LayerNorm = rescale to stay sane.
1Skip path off: the signal climbs through every layer. Predict where it ends up.your turn
After dozens of these rewrites, what's left of the original meaning?
β continueβ backR replay
Assemble the parts into one block, then stack them.
3.8 The transformer block, assembled & stackedBuilds on3.6What a neural layer actually does
Common questions
What is "Keeping the signal alive" about?
Residual = keep a copy and add the change; LayerNorm = rescale to stay sane.
What problem does it solve?
Deep stacks forget the original signal or blow up.
What will I be able to do after this lesson?
You can explain residuals and LayerNorm: the tricks that make deep nets trainable.
What comes next?
Assemble the parts into one block, then stack them.