AI explainers

Plain-language, answer-first guides to how modern AI works. Each one answers the question up front, names what people get wrong, and links to a short interactive lesson where you operate the idea yourself.

How LLMs workHow does a large language model actually work?A large language model is a next-word predictor. It turns your text into numbers, runs them through billions of learned weights, and produces a probability for every possible next token. It picks one, adds it to the text, and repeats. There is no database of facts and no lookup step. The intelligence is a very good statistical guess about what comes next, learned from a huge amount of text. Everything else, chat, code, agents, is built on that one loop.Read the explainer →
TokenizationWhat is a token, and why do AI models count tokens instead of words?A token is a chunk of text, often a word-piece rather than a whole word. Before a model can read your text it splits it into tokens and maps each to a number. Models bill and budget in tokens, not words, because tokens are the real unit they process. The same idea can cost more or less depending on how it splits: common English words are one token, while code, rare words, and many non-English languages break into more tokens per word.Read the explainer →
EmbeddingsWhat is an embedding, and how does it capture meaning?An embedding turns a piece of text into a list of numbers, a vector, positioned so that similar meanings sit close together. Meaning becomes geometry. Words and sentences that are used in similar ways end up near each other, even when they share no exact words. That is what lets a search box find the right help article from a different phrasing, and what lets retrieval pull the relevant document for an AI answer. The model learns these positions from how language is actually used.Read the explainer →
AttentionWhat does the attention mechanism do in a transformer?Attention lets each word look at the other words in the sentence and decide which ones matter for it right now. In 'the trophy did not fit in the suitcase because it was too big', attention is what tells the model that 'it' refers to the trophy. Each position gathers a weighted blend of the others, leaning hardest on the ones that fit. This is how a model handles pronouns, long-range references, and the way a word's meaning shifts with its context.Read the explainer →
RAGWhat is retrieval-augmented generation (RAG)?RAG is how an AI answers from your documents instead of only its training. When you ask a question, the system searches your content for the most relevant passages, pastes them into the model's context, and asks the model to answer using them. The model never memorized your data. It reads the retrieved text at answer time. That is why RAG can cite sources and stay current, and why most RAG failures are really retrieval failures: if the right passage was not fetched, the model cannot use it.Read the explainer →
AI agentsWhat makes an AI agent different from a chatbot?An agent is a language model placed inside a loop that can take actions. A chatbot writes a reply and stops. An agent proposes a tool call, a harness runs it, the result comes back into the context, and the model decides the next step, repeating until the task is done. The model still only predicts text. The power comes from the loop around it: read a file, run a search, call an API, check the result, try again. That loop is what turns a predictor into something that gets work done.Read the explainer →
Context engineeringWhat is context engineering, and how is it different from prompt engineering?Context engineering is deciding everything the model gets to see for a task, not just the wording of one prompt. A modern system assembles its context from many sources: instructions, the user's request, retrieved documents, past turns, tool results, memory, and files. The model can only reason about what is in that window, and the window is a limited budget. Good context engineering puts the right information in, leaves noise out, and orders it well. It is the discipline that replaced 'prompt tips' once systems got complex.Read the explainer →
LLM evalsWhat are evals, and how do teams know an AI feature actually works?An eval is a repeatable test for an AI feature: a set of inputs, and a way to score whether the outputs are good enough. Because models are non-deterministic and 'looks fine' does not scale, teams build evals to catch regressions before users do. Scoring can be exact checks, rubrics, or another model acting as a judge. The hard part is keeping evals honest: a frozen offline set can go stale or leak into training, so production and adversarial tests catch what it cannot.Read the explainer →
Prompt injectionWhat is prompt injection, and why are AI apps insecure in new ways?Prompt injection is when untrusted content the model reads contains instructions that hijack it. A model cannot reliably tell your instructions apart from text inside a web page, email, or document it was asked to process. So an attacker can hide 'ignore your task and do this instead' in that content. It becomes dangerous when an agent combines three things: access to private data, exposure to untrusted content, and a way to send data out. That combination, the lethal trifecta, is the recipe for data exfiltration.Read the explainer →
Model routingWhat is model routing, and how do you choose which AI model to use?Model routing sends each request to the cheapest model that can still handle it, instead of using one big model for everything. Most requests are easy and a small, fast, cheap model clears them; only the hard few need a frontier model. A router can decide up front, or cascade: try a small model, check the result, and escalate only if it falls short. Done well, you hold a quality bar while cutting cost and latency, because you stop paying frontier prices for easy work.Read the explainer →
Computer-use agentsHow can AI click around apps, and when is that safe?A computer-use agent operates a screen the way a person would: it takes a screenshot, plans a step, clicks or types, looks at the result, and verifies before moving on. That loop, look, plan, act, observe, verify, is what lets a model use software that has no API. The catch is that interfaces are brittle and some actions cannot be undone. So verification and human approval on risky steps are not extras; they are what separates a useful agent from one that confidently clicks the wrong button.Read the explainer →
Multimodal AIHow do images, audio, and documents become something a model can reason about?Multimodal models turn every input, text, image, audio, or a screenshot, into vectors in one shared space, then reason over all of them together. An image is cut into patches and each patch becomes a vector, the same kind of vector a word becomes. Because they live in the same space, the model can compare a picture and a caption, answer a question about a chart, or describe what is on a screen. It is the same machinery as text, pointed at more kinds of input.Read the explainer →