Plain-language explainer
Model routing, explained
What is model routing, and how do you choose which AI model to use?
Model routing sends each request to the cheapest model that can still handle it, instead of using one big model for everything. Most requests are easy and a small, fast, cheap model clears them; only the hard few need a frontier model. A router can decide up front, or cascade: try a small model, check the result, and escalate only if it falls short. Done well, you hold a quality bar while cutting cost and latency, because you stop paying frontier prices for easy work.
Do not just read it. Operate the mechanism yourself in a short interactive lesson.
See it work: Model routing: the cheapest model that passes βFree, no code, no signup.
What people get wrong
- Always use the most capable model. It is slow and expensive for the many requests that do not need it.
- Routing hurts quality. With a check or cascade you keep the quality bar and only escalate when needed.
- Cheaper models are useless. They clear a large share of real traffic at a fraction of the cost.
Where you see it in real products
- Assistants route simple lookups to small models and reasoning to large ones.
- Cost-sensitive apps cascade from cheap to expensive only when needed.
- Platforms expose a single endpoint that picks the model behind the scenes.
Related explainers
Part of See How AI Works, a free interactive course, where you learn how modern AI works by operating it, not watching videos.