Safety attack lab
Injection, excessive agency, data exfiltration, unsafe output: break each chain with one guard.
1The attack is landing. Tap the green guard node to break the chain.your turn
LLM01: Prompt InjectionPrompt injection
Untrusted web page
Agent obeys the hidden command
HARM: Agent runs the attacker's instructions
the wallThe page hid "ignore your instructions and β¦" β the agent read it as a command and obeyed.
β continueβ backR replay
Next up
E.10 Capstone: fix a broken AI assistantCommon questions
What is "Safety attack lab" about?
Injection, excessive agency, data exfiltration, unsafe output: break each chain with one guard.
What problem does it solve?
A helpful agent can be turned against you. How do attackers do it, and how do you stop them?
What will I be able to do after this lesson?
You can recognize prompt injection, excessive agency, data exfiltration, and unsafe output handling, and the guard that defuses each (OWASP LLM Top 10).