Quantization
The idea: Store weights at lower precision; small accuracy cost, big memory win.
What you'll be able to do: You can explain quantization: fewer bits per weight, big memory win, small quality cost.
The problem it solves: The model is too big to fit / serve.
Builds on: Matrix × vector as a neural layer
← Mixture of Experts · Next: GPT-2 → GPT-3 → frontier →
All lessons