The lethal trifecta
The idea: Danger spikes when an agent has all three at once: access to private data, exposure to untrusted content, and a way to send data out. Untrusted text becomes instructions (prompt injection) and exfiltrates. Remove any one leg to defuse it.
What you'll be able to do: You can explain the lethal trifecta (private data + untrusted content + external comms) and how to defuse it.
The problem it solves: Your helpful agent reads a web page — and quietly emails your private data to a stranger. How?
Builds on: An LLM feature in production
← The data flywheel · Next: Designing trustworthy AI features →
All lessons