What is prompt injection, and why are AI apps insecure in new ways?

Question

Accepted Answer

Prompt injection is when untrusted content the model reads contains instructions that hijack it. A model cannot reliably tell your instructions apart from text inside a web page, email, or document it was asked to process. So an attacker can hide 'ignore your task and do this instead' in that content. It becomes dangerous when an agent combines three things: access to private data, exposure to untrusted content, and a way to send data out. That combination, the lethal trifecta, is the recipe for data exfiltration.

Prompt injection, explained

What people get wrong

Where you see it in real products

Related explainers