Skip to content
← All explainers

Plain-language explainer

Tokens and tokenization, explained

What is a token, and why do AI models count tokens instead of words?

A token is a chunk of text, often a word-piece rather than a whole word. Before a model can read your text it splits it into tokens and maps each to a number. Models bill and budget in tokens, not words, because tokens are the real unit they process. The same idea can cost more or less depending on how it splits: common English words are one token, while code, rare words, and many non-English languages break into more tokens per word.

Do not just read it. Operate the mechanism yourself in a short interactive lesson.

See it work: How AI chops text into tokens β†’

Free, no code, no signup.

What people get wrong

  • A token is a word. Often it is a fragment, so 100 words is rarely 100 tokens.
  • Token count tracks character count. It tracks how the text splits, which is why code and some languages cost more.
  • Tokenization is a detail you can ignore. It drives cost, context limits, and even some odd model mistakes.

Where you see it in real products

  • API pricing is per token, in and out.
  • Context limits are measured in tokens, so tokenization decides how much fits.
  • Multilingual apps can be quietly more expensive because some languages tokenize into more pieces.

Related explainers

Part of See How AI Works, a free interactive course, where you learn how modern AI works by operating it, not watching videos.