Decoding & sampling knobs
The idea: Greedy vs. sampling; temperature, top-k, top-p, penalties.
What you'll be able to do: You can explain temperature, top-k and top-p, and the precision–creativity trade-off.
The problem it solves: The model outputs a distribution: how do we pick a word?
Builds on: Loss as a scoreboard
← The transformer block, assembled & stacked · Next: Context window & KV cache →
All lessons