Scaling laws
The idea: Loss falls as a power law in params, data, compute (Chinchilla).
What you'll be able to do: You can explain scaling laws: loss falls predictably with compute, but predicts loss not skills.
The problem it solves: Does throwing more at it predictably help?
Builds on: Loss as a scoreboard
← Positional information · Next: Mixture of Experts →
All lessons