Question 1

What is "From predictor to assistant" about?

Accepted Answer

After pre-training comes post-training: first fine-tune on example answers (SFT), then rank responses by human preference (RLHF). Same predictor — now it prefers helpful, instruction-following, safer replies.

Question 2

What problem does it solve?

Accepted Answer

Trained only to predict the next word, a raw model answers “Write a poem about cats” by listing more prompts — not a poem. So why does a real chatbot actually help?

Question 3

What will I be able to do after this lesson?

Accepted Answer

You can explain post-training (SFT + RLHF) and why a raw next-word predictor becomes a helpful, instruction-following assistant.

Question 4

What comes next?

Accepted Answer

Helpful or not, it still predicts one token by looking across the whole sentence — let's open up how.

From predictor to assistant

Common questions