What is model routing, and how do you choose which AI model to use?

Question

Accepted Answer

Model routing sends each request to the cheapest model that can still handle it, instead of using one big model for everything. Most requests are easy and a small, fast, cheap model clears them; only the hard few need a frontier model. A router can decide up front, or cascade: try a small model, check the result, and escalate only if it falls short. Done well, you hold a quality bar while cutting cost and latency, because you stop paying frontier prices for easy work.

Model routing, explained

What people get wrong

Where you see it in real products

Related explainers