The bigram model
The idea: Condition on the previous word(s); n-grams.
What you'll be able to do: You can explain how an n-gram works and why one word of memory isn't enough.
The problem it solves: How much context is enough?
Builds on: Predict the next word
← Predict the next word · Next: Loss as a scoreboard →
All lessons