Residuals & LayerNorm
The idea: Residual = keep a copy and add the change; LayerNorm = rescale to stay sane.
What you'll be able to do: You can explain residuals and LayerNorm: the tricks that make deep nets trainable.
The problem it solves: Deep stacks forget the original signal or blow up.
Builds on: Matrix × vector as a neural layer
← Matrix × vector as a neural layer · Next: The transformer block, assembled & stacked →
All lessons