Question 1

What is "Distillation: a small model learns from a big one" about?

Accepted Answer

Train a small 'student' model to imitate a big 'teacher' (its outputs/probabilities), capturing much of the skill at a fraction of the size and cost — most small fast models you use are distilled.

Question 2

What problem does it solve?

Accepted Answer

Frontier models are huge and pricey to serve. Must every task pay the full cost?

Question 3

What will I be able to do after this lesson?

Accepted Answer

You can explain distillation: a small student model trained to mimic a big teacher, for cheap inference.

Question 4

What comes next?

Accepted Answer

But the model is frozen: it can't know today's news.