From predictor to assistant
After pre-training comes post-training: fine-tune on answers, then rank by human preference.
1A raw model autocompletes your prompt. Tap what that continuation actually is.your turn
you type
Write a poem about cats.
raw model autocompletes →typing…
Watch it finish…
→ continue← backR replay
Helpful or not, it still predicts one token by looking across the whole sentence, let's open up how.
3.1 Why context changes everythingCommon questions
What is "From predictor to assistant" about?
After pre-training comes post-training: fine-tune on answers, then rank by human preference.
What problem does it solve?
Trained only to predict the next word, a raw model answers “Write a poem about cats” by listing more prompts, not a poem. So why does a real chatbot actually help?
What will I be able to do after this lesson?
You can explain post-training (SFT + RLHF) and why a raw next-word predictor becomes a helpful, instruction-following assistant.
What comes next?
Helpful or not, it still predicts one token by looking across the whole sentence, let's open up how.