Plain-language explainer
Prompt injection, explained
What is prompt injection, and why are AI apps insecure in new ways?
Prompt injection is when untrusted content the model reads contains instructions that hijack it. A model cannot reliably tell your instructions apart from text inside a web page, email, or document it was asked to process. So an attacker can hide 'ignore your task and do this instead' in that content. It becomes dangerous when an agent combines three things: access to private data, exposure to untrusted content, and a way to send data out. That combination, the lethal trifecta, is the recipe for data exfiltration.
Do not just read it. Operate the mechanism yourself in a short interactive lesson.
See it work: The lethal trifecta βFree, no code, no signup.
What people get wrong
- Better training will fix it. Models still struggle to separate instructions from data; you design around it.
- It only matters for chatbots. It is worst for agents that can read untrusted content and take actions.
- Input filtering solves it. It helps, but the durable fix is limiting access, actions, and outbound paths.
Where you see it in real products
- An agent that reads email and can send it is a classic injection target.
- Browsing and document tools must treat fetched content as untrusted.
- Safe designs scope credentials, gate risky actions behind approval, and log everything.
Related explainers
Part of See How AI Works, a free interactive course, where you learn how modern AI works by operating it, not watching videos.