Test-time compute: pay at answer-time
The idea: A third scaling axis (beyond parameters and data): spend more compute per question at inference — think longer, sample many tries, pick the best — trading latency and cost for accuracy.
What you'll be able to do: You can explain test-time (inference) compute as a third way to scale, and its cost/latency trade-off.
The problem it solves: Bigger models cost a fortune to train. Is training the only way to buy more capability?
Builds on: Scaling laws, The model thinks before it answers
← The model thinks before it answers · Next: The thinking dial: when more hurts →
All lessons