Skip to content
See How AI Works
all lessons
2.6E · Exhibit●●○○○

From predictor to assistant

After pre-training comes post-training: first fine-tune on example answers (SFT), then rank responses by human preference (RLHF). Same predictor — now it prefers helpful, instruction-following, safer replies.

You'll get more from this if you've seen2.4Gradient descent: rolling downhill

1A raw model only predicts the next word. What does it do with an instruction?your turn

You type: Write a poem about cats.

A raw, pre-trained model (Act 2) has only ever learned one trick: predict the next word of whatever text it sees.

So what does a RAW model do with that line?

space play/pause stepR replay
Prediction & Learning·

Common questions

What is "From predictor to assistant" about?
After pre-training comes post-training: first fine-tune on example answers (SFT), then rank responses by human preference (RLHF). Same predictor — now it prefers helpful, instruction-following, safer replies.
What problem does it solve?
Trained only to predict the next word, a raw model answers “Write a poem about cats” by listing more prompts — not a poem. So why does a real chatbot actually help?
What will I be able to do after this lesson?
You can explain post-training (SFT + RLHF) and why a raw next-word predictor becomes a helpful, instruction-following assistant.
What comes next?
Helpful or not, it still predicts one token by looking across the whole sentence — let's open up how.

Helpful or not, it still predicts one token by looking across the whole sentence — let's open up how.

3.1 The context wall