The thinking dial: when more hurts
The idea: More thinking has diminishing (then negative) returns: it adds latency and cost, can overthink simple tasks, and sometimes talks itself out of the right answer. Match the thinking budget to the problem.
What you'll be able to do: You can reason about thinking budgets: when more reasoning helps, and when it wastes time and money.
The problem it solves: If thinking helps, just crank it to max — right?
Builds on: The model thinks before it answers
← Test-time compute: pay at answer-time · Next: Speculative decoding: two models, one fast answer →
All lessons