Below the vector: tokens
The idea: Before meaning, text is split into tokens — common chunks (often sub-words) drawn from a fixed vocabulary an algorithm like BPE learned. The model only ever sees token ids, not characters.
What you'll be able to do: You can explain tokenization (BPE): text → tokens → ids, and why it explains spelling quirks and token costs.
The problem it solves: Why does an AI fumble 'how many r's in strawberry?' — it isn't reading letters.
Builds on: See it think
← Dot-product similarity · Next: Beyond text: images become tokens too →
All lessons