← all lessons4.10R · Relationship●●○○○

The thinking dial: when more hurts

More thinking has diminishing then negative returns, so match the budget to the task.

1Thinking longer made the model smarter. So crank the dial to max, right?

less thinkingmore thinking →

Cost and latency rise the whole way. Does quality?

the catchMore thinking bought more accuracy last lesson, so just max it out, always?

Why this looks like a free win

Last lesson: spending more compute at answer-time, letting the model think longer, bought more accuracy. The obvious move is to turn that dial all the way up, all the time. It's not free, but smarter is worth it, right?

→ continue← backR replay

Generating tokens is the bottleneck, can we make it faster?

4.11 Speculative decoding: two models, one fast answer

Builds on4.8The model thinks before it answers

Operating & Scaling·

1 2 3 4 5 6 7 8 9 10 11 12 13

Common questions

What is "The thinking dial: when more hurts" about?

More thinking has diminishing then negative returns, so match the budget to the task.

What problem does it solve?

If thinking helps, just crank it to max, right?

What will I be able to do after this lesson?

You can reason about thinking budgets: when more reasoning helps, and when it wastes time and money.

What comes next?

Generating tokens is the bottleneck, can we make it faster?